Methods and compositions for the making and using of guide nucleic acids

ABSTRACT

Provided herein are methods and compositions to make guide nucleic acids (gNAs), nucleic acids encoding gNAs, collections of gNAs, and nucleic acids encoding for a collection of gNAs from any source nucleic acid. Also provided herein are methods and compositions to use the resulting gNAs, nucleic acids encoding gNAs, collections of gNAs, and nucleic acids encoding for a collection of gNAs in a variety of applications.

CROSS-REFERENCE

This application is a continuation of U.S. application Ser. No.15/742,862, filed Jan. 8, 2018, which is a U.S. National StageApplication under 35 U.S.C. § 371 of International Application No.PCT/US2016/065420, filed Dec. 7, 2016, which claims the benefit of U.S.Provisional Application No. 62/264,262, filed Dec. 7, 2015, and of U.S.Provisional Application No. 62/298,963, filed Feb. 23, 2016, each ofwhich is hereby incorporated by reference in its entirety.

INCORPORATION BY REFERENCE OF SEQUENCE LISTING

The present application is being filed with a Sequence Listing inelectronic format. The content of the ASCII text file of the sequencelisting named “155949-00086_Sequence_Listing.TXT” which is 797 kb insize was created on Aug. 17, 2020, and electronically submitted viaEFS-Web herewith the application is incorporated herein by reference inits entirety.

BACKGROUND

Human clinical DNA samples and sample libraries such as cDNA librariesderived from RNA contain highly abundant sequences that have littleinformative value and increase the cost of sequencing. While methodshave been developed to deplete these unwanted sequences (e.g., viahybridization capture), these methods are often time-consuming and canbe inefficient.

Although a guide nucleic acid (gNA) mediated nuclease systems (such asguide RNA (gRNA)-mediated Cas systems) can efficiently deplete anytarget DNA, targeted depletion of very high numbers of unique DNAmolecules is not feasible. For example, a sequencing library derivedfrom human blood may contain >99% human genomic DNA. Using agRNA-mediated Cas9 system-based method to deplete this genomic DNA todetect an infectious agent circulating in the human blood would requireextremely high numbers of gRNAs (about 10-100 million gRNAs), in orderto ensure that a gRNA will be present every 30-50 base pairs (bp), andthat no target DNA will be missed. Very large numbers of gRNAs can bepredicted computationally and then synthesized chemically, but at aprohibitively expensive cost.

Therefore, there is a need in the art to provide a cost-effective methodof converting any DNA into a gNA (e.g., gRNA) library to enable, forexample, genome-wide depletion of unwanted DNA sequences from those ofinterest, without prior knowledge about their sequences. Provided hereinare methods and compositions that address this need.

SUMMARY

Provided herein are compositions and methods to generate gNAs andcollections of gNAs from any source nucleic acid. For example, gRNAs andcollections of gRNAs can be generated from source DNA, such as genomicDNA. Such gNAs and collections of the same are useful for a variety ofapplications, including depletion, partitioning, capture, or enrichmentof target sequences of interest, genome-wide labeling, genome-wideediting, genome-wide functional screens, and genome-wide regulation.

In one aspect, the invention described herein provides a collection ofnucleic acids, a plurality of the nucleic acids in the collectioncomprising: a first segment comprising a regulatory region; a secondsegment encoding a targeting sequence; and a third segment encoding anucleic acid-guided nuclease system protein-binding sequence, wherein atleast 10% of the nucleic acids in the collection vary in size. Inanother aspect, the invention described herein provides a collection ofnucleic acids, a plurality of the nucleic acids in the collectioncomprising: a first segment comprising a regulatory region; a secondsegment encoding a targeting sequence, wherein the size of the secondsegment is greater than 21 bp; and a third segment encoding a nucleicacid-guided nuclease system protein-binding sequence. In someembodiments, the nucleic acid-guided nuclease system protein is aCRISPR/Cas system protein. In some embodiments, the size of the secondsegment varies from 15-250 bp across the collection of nucleic acids. Insome embodiments, at least 10% of the second segments in the collectionare greater than 21 bp. In some embodiments, the size of the secondsegment is not 20 bp. In some embodiments, the size of the secondsegment is not 21 bp. In some embodiments, the collection of nucleicacids is a collection of DNA. In some embodiments, the second segment issingle stranded DNA. In some embodiments, the third segment is singlestranded DNA. In some embodiments, the second segment is double strandedDNA. In some embodiments, the third segment is double stranded DNA. Insome embodiments, the regulatory region is a region capable of binding atranscription factor. In some embodiments, the regulatory regioncomprises a promoter. In some embodiments, the promoter is selected fromthe group consisting of T7, SP6, and T3. In some embodiments, thetargeting sequence is directed at a mammalian genome, eukaryotic genome,prokaryotic genome, or a viral genome. In some embodiments, thetargeting sequence is directed at repetitive or abundant DNA. In someembodiments, the targeting sequence is directed at mitochondrial DNA,ribosomal DNA, Alu DNA, centromeric DNA, SINE DNA, LINE DNA, or STR DNA.In some embodiments, the sequence of the second segments is selectedfrom Table 3 and/or Table 4. In some embodiments, the collectioncomprises at least 10² unique nucleic acid molecules. In someembodiments, the targeting sequence is at least 80% complementary to thestrand opposite to a sequence of nucleotides 5′ to a PAM sequence. Insome embodiments, the collection comprises targeting sequences directedto sequences of interest spaced about every 10,000 bp or less across thegenome of an organism. In some embodiments, the PAM sequence is AGG,CGG, or TGG. In some embodiments, the PAM sequence is specific for aCRISPR/Cas system protein selected from the group consisting of Cas9,Cpf1, Cas3, Cas8a-c, Cas10, Cse1, Csy1, Csn2, Cas4, Csm2, and Cm5. Insome embodiments, the third segment comprises DNA encoding a gRNAstem-loop sequence. In some embodiments, the third segment encodes for aRNA comprising the sequenceGUUUUAGAGCUAGAAAUAGCAAGUUAAAAUAAGGCUAGUCCGUUAUCAACUUGAAAAAGUGGCACCGAGUCGGUGCUUUUUUU (SEQ ID NO: 1) or encodes for a RNA comprising thesequenceGUUUUAGAGCUAUGCUGGAAACAGCAUAGCAAGUUAAAAUAAGGCUAGUCCGUUAUCAACUUGAAAAAGUGGCACCGAGUCGGUGCUUUUUUUC (SEQ ID NO: 2). In some embodiments, thesequence of the third segment encodes for a crRNA and a tracrRNA. Insome embodiments, the nucleic acid-guided nuclease system protein isfrom a bacterial species. In some embodiments, the nucleic acid-guidednuclease system protein is from an archaea species. In some embodiments,the CRISPR/Cas system protein is a Type I, Type II, or Type III protein.In some embodiments, the CRISPR/Cas system protein is selected from thegroup consisting of Cas9, Cpf1, Cas3, Cas8a-c, Cas10, Cse1, Csy1, Csn2,Cas4, Csm2, Cm5, dCas9 and cas9 nickase. In some embodiments, the thirdsegment comprises DNA encoding a Cas9-binding sequence. In someembodiments, a plurality of third segments of the collection encode fora first nucleic acid-guided nuclease system protein binding sequence,and a plurality of the third segments of the collection encode for asecond nucleic acid-guided nuclease system protein binding sequence. Insome embodiments, the third segments of the collection encode for aplurality of different binding sequences of a plurality of differentbinding sequences of a plurality of different nucleic acid-guidednuclease system proteins.

In another aspect, the invention described herein provides for acollection of guide RNAs (gRNAs), comprising: a first RNA segment atargeting sequence; and a second RNA segment comprising a nucleicacid-guided nuclease system protein-binding sequence, wherein at least10% of the gRNAs in the collection vary in size. In some embodiments,the nucleic acid-guided nuclease system protein is a CRISPR/Cas systemprotein. In some embodiments, the size of the first segment varies from15-250 bp across the collection of gRNAs. In some embodiments, the atleast 10% of the first segments in the collection are greater than 21bp. In some embodiments, the size of the first segment is not 20 bp. Insome embodiments, the size of the first segment is not 21 bp. In someembodiments, the targeting sequence is directed at a mammalian genome,eukaryotic genome, prokaryotic genome, or viral genome. In someembodiments, the targeting sequence is directed at repetitive orabundant DNA. In some embodiments, the targeting sequence is directed atmitochondrial DNA, ribosomal DNA, Alu DNA, centromeric DNA, SINE DNA,LINE DNA, or STR DNA. In some embodiments, the sequence of the firstsegments is RNA encoded by sequences selected from Table 3 and/or Table4. In some embodiments, the collection comprises at least (unique gRNAs.In some embodiments, the gRNAs comprise cytosine, guanine, and adenine.In some embodiments, a subset of the gRNAs further comprises thymine. Insome embodiments, a subset of the gRNAs further comprises uracil. Insome embodiments, the first segment is at least 80% complementary to atarget genomic sequence of interest. In some embodiments, the targetingsequence is at least 80% complementary to the strand opposite to asequence of nucleotides 5′ to a PAM sequence. In some embodiments thePAM sequence is AGG, CGG, or TGG. In some embodiments, the PAM sequenceis specific for a CRISPR/Cas system protein selected from the groupconsisting of Cas9, Cpf1, Cas3, Cas8a-c, Cas10, Cse1, Csy1, Csn2, Cas4,Csm2, and Cm5. In some embodiments, the second segment comprises a gRNAstem-loop sequence. In some embodiments, the third segment comprises DNAencoding a gRNA stem-loop sequence. In some embodiments, the thirdsegment comprises the sequenceGUUUUAGAGCUAGAAAUAGCAAGUUAAAAUAAGGCUAGUCCGUUAUCAACUUGAAAAAGUGGCACCGAGUCGGUGCUUUUUUU (SEQ ID NO: 1) or comprises the sequenceGUUUUAGAGCUAUGCUGGAAACAGCAUAGCAAGUUAAAAUAAGGCUAGUCCGUUAUCAACUUGAAAAAGUGGCACCGAGUCGGUGCUUUUUUUC (SEQ ID NO: 2). In some embodiments, thesecond segment comprises a crRNA and a tracrRNA. In some embodiments,the nucleic acid-guided nuclease system protein is from a bacterialspecies. In some embodiments, the nucleic acid-guided nuclease systemprotein is from an archaea species. In some embodiments, the CRISPR/Cassystem protein is a Type I, Type II, or Type III protein. In someembodiments, the CRISPR/Cas system protein is selected from the groupconsisting of Cas9, Cpf1, Cas3, Cas8a-c, Cas10, Cse1, Csy1, Csn2, Cas4,Csm2, Cm5, dCas9 and cas9 nickase. In some embodiments, the secondsegment comprises a Cas9-binding sequence. In some embodiments, at least10% of the gRNAs in the collection vary in their 5′ terminal-endsequence. In some embodiments, the collection comprises targetingsequences directed to sequences of interest spaced every 10,000 bp orless across the genome of an organism. In some embodiments, a pluralityof second segments of the collection comprise a first nucleicacid-guided nuclease system protein binding sequence, and a plurality ofthe second segments of the collection comprise a second nucleicacid-guided nuclease system protein binding sequence. In someembodiments, the second segments of the collection comprise a pluralityof different binding sequences of a plurality of different nucleicacid-guided nuclease system proteins. In some embodiments, a pluralityof the gRNAs of the collection are attached to a substrate. In someembodiments, a plurality of the gRNAs of the collection comprise alabel. In some particular embodiments, a plurality of the gRNAs of thecollection comprise different labels.

In another aspect, the invention described herein provides nucleic acidcomprising: a first segment comprising a regulatory region; a secondsegment encoding a targeting sequence, wherein the targeting sequence isgreater than 30 bp; and a third segment encoding a nucleic acid encodinga nucleic acid-guided nuclease system protein-binding sequence. In someembodiments, the nucleic acid-guided nuclease is a CRISPR/Cas systemprotein. In some embodiments, the nucleic acid is DNA. In someembodiments, the second segment is single stranded DNA. In someembodiments, the third segment is single stranded DNA. In someembodiments, the second segment is double stranded DNA. In someembodiments, the third segment is double stranded DNA. In someembodiments, the regulatory region is a region capable of binding atranscription factor. In some embodiments, the regulatory regioncomprises a promoter. In some embodiments, the promoter is selected fromthe group consisting of T7, SP6, and T3. In some embodiments, thetargeting sequence is directed at a mammalian genome, eukaryotic genome,prokaryotic genome, or a viral genome. In some embodiments, thetargeting sequence is directed at abundant or repetitive DNA. In someembodiments, the targeting sequence is directed at mitochondrial DNA,ribosomal DNA, Alu DNA, centromeric DNA, SINE DNA, LINE DNA, or STR DNA.In some embodiments, the sequence of the second segments is selectedfrom Table 3 and/or Table 4. In some embodiments, the targeting sequenceis at least 80% complementary to the strand opposite to a sequence ofnucleotides 5′ to a PAM sequence. In some embodiments, the targetgenomic sequence of interest is 5′ upstream of a PAM sequence. In someembodiments, the PAM sequence is specific for a CRISPR/Cas systemprotein selected from the group consisting of Cas9, Cpf1, Cas3, Cas8a-c,Cas10, Cse1, Csy1, Csn2, Cas4, Csm2, and Cm5. In some embodiments, thethird segment comprises DNA encoding a gRNA stem-loop sequence. In someembodiments, the third segment comprises DNA encoding a gRNA stem-loopsequence. In some embodiments, the third segment encodes for a RNAcomprising the sequenceGUUUUAGAGCUAGAAAUAGCAAGUUAAAAUAAGGCUAGUCCGUUAUCAACUUGAAAAAGUGGCACCGAGUCGGUGCUUUUUUU (SEQ ID NO: 1) or encodes for a RNA comprising thesequenceGUUUUAGAGCUAUGCUGGAAACAGCAUAGCAAGUUAAAAUAAGGCUAGUCCGUUAUCAACUUGAAAAAGUGGCACCGAGUCGGUGCUUUUUUUC (SEQ ID NO: 2). In some embodiments, thenucleic acid-guided nuclease system protein is from a bacterial species.In some embodiments, the nucleic acid-guided nuclease system protein isfrom an archaea species. In some embodiments, the CRISPR/Cas systemprotein is a Type I, Type II, or Type III protein. In some embodiments,the CRISPR/Cas system protein is selected from the group consisting ofCas9, Cpf1, Cas3, Cas8a-c, Cas10, Cse1, Csy1, Csn2, Cas4, Csm2, Cm5,dCas9 and cas9 nickase. In some embodiments, the third segment comprisesDNA encoding a Cas9-binding sequence.

In another aspect, the invention described herein provides a guide RNAcomprising a first segment comprising a targeting sequence, wherein thesize of the first segment is greater than 30 bp; and a second segmentcomprising a nucleic acid-guided nuclease system protein-bindingsequence. In some embodiments, the nucleic acid-guided nuclease is aCRISPR/Cas system protein. In some embodiments, the gRNA comprises anadenine, a guanine, and a cytosine. In some embodiments, the gRNAfurther comprises a thymine. In some embodiments, the gRNA furthercomprises a uracil. In some embodiments, the size of the first RNAsegment is between 30 and 250 bp. In some embodiments, the targetingsequence is directed at a mammalian genome, eukaryotic genome,prokaryotic genome, or viral genome. In some embodiments, the targetingsequence is directed at repetitive or abundant DNA. In some embodiments,the targeting sequence is directed at mitochondrial DNA, ribosomal DNA,Alu DNA, centromeric DNA, SINE DNA, LINE DNA, or STR DNA. In someembodiments, the first segment is at least 80% complementary to thetarget genomic sequence of interest. In some embodiments, the targetingsequence is at least 80% complementary to the strand opposite to asequence of nucleotides 5′ to a PAM sequence. In some embodiments, thesecond segment comprises a gRNA stem-loop sequence. In some embodiments,the sequence of the second segment comprisesGUUUUAGAGCUAGAAAUAGCAAGUUAAAAUAAGGCUAGUCCGUUAUCAACUUGAAAAAGUGGCACCGAGUCGGUGCUUUUUUU (SEQ ID NO: 1) or comprises the sequenceGUUUUAGAGCUAUGCUGGAAACAGCAUAGCAAGUUAAAAUAAGGCUAGUCCGUUAUCAACUUGAAAAAGUGGCACCGAGUCGGUGCUUUUUUUC (SEQ ID NO: 2). In some embodiments, thesequence of the third segment comprises a crRNA and a tracrRNA. In someembodiments, the nucleic acid-guided nuclease system protein is from abacterial species. In some embodiments, the nucleic acid-guided nucleasesystem protein is from an archaea species. In some embodiments, theCRISPR/Cas system protein is a Type I, Type II, or Type III protein. Insome embodiments, the CRISPR/Cas system protein is selected from thegroup consisting of Cas9, Cpf1, Cas3, Cas8a-c, Cas10, Cse1, Csy1, Csn2,Cas4, Csm2, Cm5, dCas9 and cas9 nickase. In some embodiments, the secondsegment is a Cas9-binding sequence.

In another aspect, the invention provides a complex comprising a nucleicacid-guided nuclease system protein and a comprising a first segmentcomprising a targeting sequence, wherein the size of the first segmentis greater than 30 bp; and a second segment comprising a nucleicacid-guided nuclease system protein-binding sequence.

In another aspect, the invention described herein provides a method fordepleting and partitioning of targeted sequences in a sample, enrichinga sample for non-host nucleic acids, or serially depleting targetednucleic acids in a sample comprising: providing nucleic acids extractedfrom a sample; and contacting the sample with a plurality of complexescomprising (i) any one of the collection of gRNAs provided herein; and(ii) nucleic acid-guided nuclease system proteins. In some embodiments,the nucleic acid-guided nuclease system proteins are CRISPR/Cas systemproteins. In some embodiments, the CRISPR/Cas system proteins are Cas9proteins.

In another aspect, the invention provides a method of making acollection of nucleic acids, each comprising a DNA encoding a targetingsequence ligated to a DNA encoding a nucleic acid-guided nuclease systemprotein-binding sequence, comprising: (a) providing double-stranded DNAmolecules, each comprising a sequence of interest 5′ to a PAM sequence,and its reverse complementary sequence on the opposite strand; (b)performing an enzymatic digestion reaction on the double stranded DNAmolecules, wherein cleavages are generated at the PAM sequence and/orits reverse complementary sequence on the opposite strand, but nevercompletely remove the PAM sequence and/or its reverse complementarysequence on the opposite strand from the double stranded DNA; (c)ligating adapters comprising a recognition sequence to the resulting DNAmolecules of step b; (d) contacting the DNA molecules of step c with anrestriction enzyme that recognizes the recognition sequence of step c,whereby generating DNA fragments comprising blunt-ended double strandbreaks immediately 5′ to the PAM sequence, whereby removing the PAMsequence and the adapter containing the enzyme recognition site; and (e)ligating the resulting double stranded DNA fragments of step d with aDNA encoding a nucleic acid-guided nuclease system protein-bindingsequence, whereby generating a plurality of DNA fragments, eachcomprising a DNA encoding a targeting sequence ligated to a DNA encodinga nucleic acid-guided nuclease system protein-binding sequence. In someembodiments, the nucleic acid-guided nuclease is a CRISPR/Cas nucleicacid-guided nuclease system protein. In some embodiments, the startingDNA molecules of the collection further comprise a regulatory sequenceupstream of the sequence of interest 5′ to the PAM sequence. In someembodiments, the regulatory sequence comprises a promoter. In someembodiments, the promoter comprises a T7, Sp6, or T3 sequence. In someembodiments, the double stranded DNA molecules are genomic DNA, intactDNA, or sheared DNA. In some embodiments, the genomic DNA is human,mouse, avian, fish, plant, insect, bacterial, or viral. In someembodiments, the DNA segments encoding a targeting sequence are at least22 bp. In some embodiments, the DNA segments encoding a targetingsequence are 15-250 bp in size range. In some embodiments, the PAMsequence is AGG, CGG, or TGG. In some embodiments, the PAM sequence isspecific for a CRISPR/Cas system protein selected from the groupconsisting of Cas9, Cpf1, Cas3, Cas8a-c, Cas10, Cse1, Csy1, Csn2, Cas4,Csm2, and Cm5. In some embodiments, step (b) further comprises (1)contacting the DNA molecules with an enzyme capable of creating a nickin a single strand at a CCD site, whereby generating a plurality ofnicked double stranded DNA molecules, each comprising a sequence ofinterest followed by an HGG sequence, wherein the DNA molecules arenicked at the CCD sites; and (2) contacting the nicked double strandedDNA molecules with an endonuclease, whereby generating a plurality ofdouble stranded DNA fragments, each comprising a sequence of interestfollowed by an HGG sequence wherein residual nucleotides from HGG and/orCCD sequences is (are) left behind. In some embodiments, step (d)further comprises PCR amplification of the adaptor-ligated DNA fragmentsfrom step (c) before cutting with the restriction enzyme recognizing therecognition sequence of step (c), wherein after PCR, the recognitionsequence is positioned 3′ of the PAM sequence, and a regulatory sequenceis positioned at the 5′ distal end of the PAM sequence. In someembodiments, the enzymatic reaction of step (b) comprises the use of aNt.CviPII enzyme, and a T7 Endonuclease I enzyme. In some embodiments,step (c) further comprises a blunt-end reaction with a T4 DNAPolymerase, if the adapter to be ligated does not comprise an overhang.In some embodiments, the adapter of step (c) is either (1) doublestranded, comprising a restriction enzyme recognition sequence in onestrand, and a regulatory sequence in the other strand, if the adapter isY-shaped and comprises an overhang; or (2) has a palindromic enzymerecognition sequence in both strands, if the adapter is not Y-shaped. Insome embodiments, the restriction enzyme of step (d) is MlyI. In someembodiments, the restriction enzyme of step (d) is BaeI. In someembodiments, step (d) further comprises contacting the DNA moleculeswith an XhoI enzyme. In some embodiments, in step (e) the DNA encoding anucleic acid-guided nuclease system-protein binding sequence encodes fora RNA comprising the sequenceGUUUUAGAGCUAGAAAUAGCAAGUUAAAAUAAGGCUAGUCCGUUAUCAACUUGAAAAAGUGGCACCGAGUCGGUGCUUUUUUU (SEQ ID NO: 1) or encodes for a RNA comprising thesequenceGUUUUAGAGCUAUGCUGGAAACAGCAUAGCAAGUUAAAAUAAGGCUAGUCCGUUAUCAACUUGAAAAAGUGGCACCGAGUCGGUGCUUUUUUUC (SEQ ID NO: 2). In some embodiments, thetargeted sequences of interest are spaced every 10,000 bp or less acrossthe genome of an organism.

In another aspect, the invention provides a method of making acollection of nucleic acids, each comprising a DNA encoding a targetingsequence ligated to a DNA encoding a nucleic acid-guided nuclease systemprotein-binding sequence, comprising: (a) providing a plurality ofdouble stranded DNA molecules, each comprising a sequence of interest,an NGG site, and its complement CCN site; (b) contacting the moleculeswith an enzyme capable of creating a nick in a single strand at a CCNsite, whereby generating a plurality of nicked double stranded DNAmolecules, each comprising a sequence of interest 5′ to the NGG site,wherein the DNA molecules are nicked at the CCD sites; (c) contactingthe nicked double stranded DNA molecules with an endonuclease, wherebygenerating a plurality of double stranded DNA fragments, each comprisinga sequence of interest, wherein the fragments comprise an terminaloverhang; (d) contacting the double stranded DNA fragments with anenzyme without 5′ to 3′ exonuclease activity to blunt end the doublestranded DNA fragments, whereby generating a plurality of blunt endeddouble stranded fragments, each comprising a sequence of interest; (e)contacting the blunt ended double stranded fragments of step d with anenzyme that cleaves the terminal NGG site; and (f) ligating theresulting double stranded DNA fragments of step e with a DNA encoding anucleic acid-guided nuclease system-protein binding sequence, wherebygenerating a plurality of DNA fragments, each comprising a targetingsequence ligated to a DNA encoding a nucleic acid-guided nuclease systemprotein-binding sequence. In some embodiments, the nucleic acid-guidednuclease is a CRISPR/Cas system protein. In some embodiments, theplurality of double stranded DNA molecules have a regulatory sequence 5′upstream of the NGG sites. In some embodiments, the regulatory sequencecomprises a T7, SP6, or T3 sequence. In some embodiments, the NGG sitecomprises AGG, CGG, or TGG, and the CCN site comprises CCT, CCG, or CCA.In some embodiments, the plurality of double stranded DNA molecules,each comprising a sequence of interest comprise sheared fragments ofgenomic DNA. In some embodiments, the genomic DNA is mammalian,prokaryotic, eukaryotic, avian, bacterial or viral. In some embodiments,the plurality of double stranded DNA molecules in step (a) are at least500 bp. In some embodiments, the enzyme in step b is a Nt.CviPII enzyme.In some embodiments, the enzyme in step c is a T7 Endonuclease I. Insome embodiments, the enzyme in step d is a T4 DNA Polymerase. In someembodiments, in step f the DNA encoding a nucleic acid-guided nucleasesystem-protein binding sequence encodes for a RNA comprising thesequenceGUUUUAGAGCUAGAAAUAGCAAGUUAAAAUAAGGCUAGUCCGUUAUCAACUUGAAAAAGUGGCACCGAGUCGGUGCUUUUUUU (SEQ ID NO: 1) or encodes for a RNA comprising thesequenceGUUUUAGAGCUAUGCUGGAAACAGCAUAGCAAGUUAAAAUAAGGCUAGUCCGUUAUCAACUUGAAAAAGUGGCACCGAGUCGGUGCUUUUUUUC (SEQ ID NO: 2). In some embodiments, thestep e additionally comprises ligating adaptors carrying a MlyIrecognition site and digesting with MlyI enzyme. In some embodiments,the sequence of interest is spaced every 10,000 bp or less across thegenome.

In another aspect, the invention provides a method of making acollection of nucleic acids, each comprising a DNA encoding a targetingsequence and a DNA encoding a nucleic acid-guided nuclease systemprotein-binding sequence, comprising: (a) providing genomic DNAcomprising a plurality of sequences of interest, comprising NGG and CCNsites; (b) contacting the genomic DNA with an enzyme capable of creatingnicks in the genomic DNA, whereby generating nicked genomic DNA, nickedat CCN sites; (c) contacting the nicked genomic DNA with anendonuclease, whereby generating double stranded DNA fragments, with anoverhang; (d) ligating the DNA with overhangs from step c to a Y-shapedadapter, thereby introducing a restriction enzyme recognition sequenceonly at 3′ of the NGG site and a regulatory sequence 5′ of the sequenceof interest; (e) contacting the product from step d with an enzyme thatcleaves away the NGG site together with the adaptor carrying the enzymerecognition sequence; and (f) ligating the resulting double stranded DNAfragments of step e with a DNA encoding a nucleic acid-guided nucleasesystem protein-binding sequence, whereby generating a plurality of DNAfragments, each comprising a sequence of interest ligated to a DNAencoding a nucleic acid-guided nuclease system protein-binding sequence.In some embodiments, the nucleic acid-guided nuclease is a CRISPR/Cassystem protein. In some embodiments, the NGG site comprises AGG, CGG, orTGG, and CCN site comprises CCT, CCG, or CCA. In some embodiments, theregulatory sequence comprises a promoter sequence. In some embodiments,the promoter sequence comprises a T7, SP6, or T3 sequence. In someembodiments, the DNA fragments are sheared fragments of genomic DNA.

In some embodiments, the genomic DNA is mammalian, prokaryotic,eukaryotic, or viral. In some embodiments, the fragments are at least200 bp. In some embodiments, the enzyme in step b is a Nt.CviPII enzyme.In some embodiments, the enzyme in step c is a T7 Endonuclease I. Insome embodiments, step d further comprises PCR amplification of theadaptor-ligated DNA. In some embodiments, in step f, the DNA encodingnucleic acid-guided nuclease system protein-binding sequence encodes fora RNA comprising the sequenceGUUUUAGAGCUAGAAAUAGCAAGUUAAAAUAAGGCUAGUCCGUUAUCAACUUGAAAAAGUGGCACCGAGUCGGUGCUUUUUUU (SEQ ID NO: 1) or encodes for a RNA comprising thesequenceGUUUUAGAGCUAUGCUGGAAACAGCAUAGCAAGUUAAAAUAAGGCUAGUCCGUUAUCAACUUGAAAAAGUGGCACCGAGUCGGUGCUUUUUUUC (SEQ ID NO: 2). In some embodiments, theenzyme removing NGG site in step e is MlyI. In some embodiments, thetarget of interest of the collection is spaced every 10,000 bp or lessacross the genome.

In another aspect, the invention provides kits and/or reagents usefulfor performing a method of making a collection of nucleic acids, eachcomprising a DNA encoding a targeting sequence ligated to a DNA encodinga nucleic acid-guided nuclease system protein-binding sequence, asdescribed in the embodiments herein.

In another aspect, the invention described herein provides kitcomprising a collection of nucleic acids, a plurality of the nucleicacids in the collection comprising: a first segment comprising aregulatory region; a second segment encoding a targeting sequence; and athird segment encoding a CRISPR/Cas system protein-binding sequence,wherein at least 10% of the nucleic acids in the collection vary insize.

In another aspect, the invention described herein provides a kitcomprising a collection of nucleic acids, a plurality of the nucleicacids in the collection comprising: a first segment comprising aregulatory region; a second segment encoding a targeting sequence,wherein the size of the second segment is greater than 21 bp; and athird segment encoding a CRISPR/Cas system protein-binding sequence.

In another aspect, the invention described herein provides a kitcomprising a collection of guide RNAs comprising a first RNA segment atargeting sequence; and a second RNA segment comprising a CRISPR/Cassystem protein-binding sequence, wherein at least 10% of the gRNAs inthe collection vary in size.

In another aspect, the invention described herein provides a method ofmaking a collection of guide nucleic acids, comprising: a. obtainingabundant cells in a source sample; b. collecting nucleic acids from saidabundant cells; and c. preparing a collection of guide nucleic acids(gNAs) from said nucleic acids. In some embodiments, said abundant cellscomprise cells from one or more most abundant bacterial species in saidsource sample. In some embodiments, said abundant cells comprise cellsfrom more than one species. In some embodiments, said abundant cellscomprise human cells. In some embodiments, said abundant cells compriseanimal cells. In some embodiments, said abundant cells comprise plantcells. In some embodiments, said abundant cells comprise bacterialcells. In some embodiments, the method further comprises contactingnucleic acid-guided nucleases with said library of gNAs to form nucleicacid-guided nuclease-gNA complexes. In some embodiments, the methodfurther comprises using said nucleic acid-guided nuclease-gNA complexesto cleave target nucleic acids at target sites, wherein said gNAs arecomplementary to said target sites. In some embodiments, said targetnucleic acids are from said source sample. In some embodiments, aspecies of said target nucleic acids is the same as a species of saidsource sample. In some embodiments, said species of said target nucleicacids and said species of said source sample is human. In someembodiments, said species of said target nucleic acids and said speciesof said source sample is animal. In some embodiments, said species ofsaid target nucleic acids and said species of said source sample isplant.

In another aspect, the invention described herein provides a method ofmaking a collection of nucleic acids, each comprising a targetingsequence, comprising: a. obtaining source DNA; b. nicking said sourceDNA with a nicking enzyme at nicking enzyme recognition sites, therebyproducing double-stranded breaks at proximal nicks; and c. repairingoverhangs of said double-stranded breaks, thereby producing adouble-stranded fragment comprising (i) a targeting sequence and (ii)said nicking enzyme recognition site. In another aspect, the inventiondescribed herein provides a method of making a collection of nucleicacids, each comprising a targeting sequence, comprising: a. obtainingsource DNA; b. nicking said source DNA with a nicking enzyme at nickingenzyme recognition sites, thereby producing a nick; and c. synthesizinga new strand from said nick, thereby producing a single-strandedfragment of said source DNA comprising a targeting sequence. In someembodiments, the method further comprises producing a double-strandedfragment comprising said targeting sequence from said single-strandedfragment. In some embodiments, said producing said double-strandedfragment comprises random priming and extension. In some embodiments,said random priming is conducted with a primer comprising a random n-merregion and a promoter region. In some embodiments, said random n-merregion is a random hexamer region. In some embodiments, said randomn-mer region is a random octamer region. In some embodiments, saidpromoter region is a T7 promoter region. In some embodiments, the methodfurther comprises ligating a nuclease recognition site nucleic acidcomprising a nuclease recognition site to said double-stranded fragment.In some embodiments, said nuclease recognition site corresponds to anuclease that cuts at a distance from said nuclease recognition siteequal to the length of said nicking enzyme recognition sites. In someembodiments, said nuclease recognition site is a MlyI recognition site.In some embodiments, said nuclease recognition site is a BaeIrecognition site. In some embodiments, the method further comprisesdigesting said double-stranded fragment with said nuclease, therebyremoving said nicking enzyme recognition site from said double-strandedfragment. In some embodiments, the method further comprises ligatingsaid double-stranded fragment to a nucleic acid-guided nuclease systemprotein recognition site nucleic acid comprising a nucleic acid-guidednuclease system protein recognition site. In some embodiments, saidnucleic acid-guided nuclease system protein recognition site comprises aguide RNA stem-loop sequence. In some embodiments, said nucleaserecognition site corresponds to a nuclease that cuts at a distance fromsaid nuclease recognition site equal to a length of said targetingsequence. In some embodiments, said length of said targeting sequence is20 base pairs. In some embodiments, said nuclease recognition site is aMmeI recognition site. In some embodiments, the method further comprisesdigesting said double-stranded fragment with said nuclease. In someembodiments, said nuclease recognition site corresponds to a nucleasethat cuts at a distance from said nuclease recognition site equal to alength of said targeting sequence plus a length of said nicking enzymerecognition sites. In some embodiments, said length of said targetingsequence plus a length of said nicking enzyme recognition sites is 23base pairs. In some embodiments, said nuclease recognition site is aEcoP15I recognition site. In some embodiments, the method furthercomprises digesting said double-stranded fragment with said nuclease. Insome embodiments, the method further comprises ligating saiddouble-stranded fragment to a nucleic acid-guided nuclease systemprotein recognition site nucleic acid comprising a nucleic acid-guidednuclease system protein recognition site. In some embodiments, saidnucleic acid-guided nuclease system protein recognition site comprises aguide RNA stem-loop sequence.

In another aspect, the invention described herein provides a kitcomprising all essential reagents and instructions for carrying out themethods of aspects of the invention described herein.

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1 illustrates an exemplary scheme for producing a collection ofgRNAs (a gRNA library) from genomic DNA.

FIG. 2 illustrates another exemplary scheme for producing a collectionof gRNAs (a gRNA library) from genomic DNA.

FIG. 3 illustrates an exemplary scheme for nicking of DNA and subsequenttreatment with polymerase to generate blunt ends.

FIG. 4 illustrates an exemplary scheme for sequential production of alibrary of gNAs using three adapters.

FIG. 5 illustrates an exemplary scheme for sequential production of alibrary of gNAs using one adapter and one oligo.

FIG. 6 illustrates an exemplary scheme for generation of a large pool ofDNA fragments with blunt ends using Nicking Enzyme Mediated DNAAmplification (NEMDA).

FIG. 7 illustrates an exemplary scheme for generation of a large pool ofgNAs using Nicking Enzyme Mediated DNA Amplification (NEMDA).

DETAILED DESCRIPTION OF THE INVENTION

There is a need in the art for a scalable, low-cost approach to generatelarge numbers of diverse guide nucleic acids (gNAs) (e.g., gRNAs, gDNAs)for a variety of downstream applications.

Unless defined otherwise herein, all technical and scientific terms usedherein have the same meaning as commonly understood by one of ordinaryskill in the art to which this invention belongs. Although any methodsand materials similar or equivalent to those described herein can beused in the practice or testing of the present invention, the preferredmethods and materials are described.

Numeric ranges are inclusive of the numbers defining the range.

For purposes of interpreting this specification, the followingdefinitions will apply and whenever appropriate, terms used in thesingular will also include the plural and vice versa. In the event thatany definition set forth below conflicts with any document incorporatedherein by reference, the definition set forth shall control.

As used herein, the singular form “a”, “an”, and “the” includes pluralreferences unless indicated otherwise.

It is understood that aspects and embodiments of the invention describedherein include “comprising,” “consisting,” and “consisting essentiallyof” aspects and embodiments.

The term “about” as used herein refers to the usual error range for therespective value readily known to the skilled person in this technicalfield. Reference to “about” a value or parameter herein includes (anddescribes) embodiments that are directed to that value or parameter perse.

The term “nucleic acid,” as used herein, refers to a molecule comprisingone or more nucleic acid subunits. A nucleic acid can include one ormore subunits selected from adenosine (A), cytosine (C), guanine (G),thymine (T) and uracil (U), and modified versions of the same. A nucleicacid comprises deoxyribonucleic acid (DNA), ribonucleic acid (RNA),combinations, or derivatives thereof. A nucleic acid may besingle-stranded and/or double-stranded.

The nucleic acids comprise “nucleotides”, which, as used herein, isintended to include those moieties that contain purine and pyrimidinebases, and modified versions of the same. Such modifications includemethylated purines or pyrimidines, acylated purines or pyrimidines,alkylated riboses or other heterocycles. In addition, the term“nucleotide” or “polynucleotide” includes those moieties that containhapten or fluorescent labels and may contain not only conventionalribose and deoxyribose sugars, but other sugars as well. Modifiednucleosides, nucleotides or polynucleotides also include modificationson the sugar moiety, e.g., wherein one or more of the hydroxyl groupsare replaced with halogen atoms or aliphatic groups, or arefunctionalized as ethers, amines, or the like.

The term “nucleic acids” and “polynucleotides” are used interchangeablyherein. Polynucleotide is used to describe a nucleic acid polymer of anylength, e.g., greater than about 2 bases, greater than about 10 bases,greater than about 100 bases, greater than about 500 bases, greater than1000 bases, up to about 10,000 or more bases composed of nucleotides,e.g., deoxyribonucleotides or ribonucleotides, and may be producedenzymatically or synthetically (e.g., PNA as described in U.S. Pat. No.5,948,902 and the references cited therein) which can hybridize withnaturally occurring nucleic acids in a sequence specific manneranalogous to that of two naturally occurring nucleic acids, e.g., canparticipate in Watson-Crick base pairing interactions.Naturally-occurring nucleotides include guanine, cytosine, adenine andthymine (G, C, A and T, respectively). DNA and RNA have a deoxyriboseand ribose sugar backbones, respectively, whereas PNA's backbone iscomposed of repeating N-(2-aminoethyl)-glycine units linked by peptidebonds. In PNA various purine and pyrimidine bases are linked to thebackbone by methylene carbonyl bonds. A locked nucleic acid (LNA), oftenreferred to as inaccessible RNA, is a modified RNA nucleotide. Theribose moiety of an LNA nucleotide is modified with an extra bridgeconnecting the 2′ oxygen and 4′ carbon. The bridge “locks” the ribose inthe 3′-endo (North) conformation, which is often found in the A-formduplexes. LNA nucleotides can be mixed with DNA or RNA residues in theoligonucleotide whenever desired. The term “unstructured nucleic acid,”or “UNA” is a nucleic acid containing non-natural nucleotides that bindto each other with reduced stability. For example, an unstructurednucleic acid may contain a G′ residue and a C′ residue, where theseresidues correspond to non-naturally occurring forms, i.e., analogs, ofG and C that base pair with each other with reduced stability, butretain an ability to base pair with naturally occurring C and Gresidues, respectively. Unstructured nucleic acid is described inUS20050233340, which is incorporated by reference herein for disclosureof UNA.

The term “oligonucleotide” as used herein denotes a single-strandedmultimer of nucleotides.

Unless otherwise indicated, nucleic acids are written left to right in5′ to 3′ orientation; amino acid sequences are written left to right inamino to carboxy orientation, respectively.

The term “cleaving,” as used herein, refers to a reaction that breaksthe phosphodiester bonds between two adjacent nucleotides in bothstrands of a double-stranded DNA molecule, thereby resulting in adouble-stranded break in the DNA molecule.

The term “nicking” as used herein, refers to a reaction that breaks thephosphodiester bond between two adjacent nucleotides in only one strandof a double-stranded DNA molecule, thereby resulting in a break in onestrand of the DNA molecule.

The term “cleavage site, as used herein, refers to the site at which adouble-stranded DNA molecule has been cleaved.

The “nucleic acid-guided nuclease-gNA complex” refers to a complexcomprising a nucleic acid-guided nuclease protein and a guide nucleicacid (gNA, for example a gRNA or a gDNA). For example the “Cas9-gRNAcomplex” refers to a complex comprising a Cas9 protein and a guide RNA(gRNA). The nucleic acid-guided nuclease may be any type of nucleicacid-guided nuclease, including but not limited to wild type nucleicacid-guided nuclease, a catalytically dead nucleic acid-guided nuclease,or a nucleic acid-guided nuclease-nickase.

The term “nucleic acid-guided nuclease-associated guide NA” refers to aguide nucleic acid (guide NA). The nucleic acid-guidednuclease-associated guide NA may exist as an isolated nucleic acid, oras part of a nucleic acid-guided nuclease-gNA complex, for example aCas9-gRNA complex.

The terms “capture” and “enrichment” are used interchangeably herein,and refer to the process of selectively isolating a nucleic acid regioncontaining: sequences of interest, targeted sites of interest, sequencesnot of interest, or targeted sites not of interest.

The term “hybridization” refers to the process by which a strand ofnucleic acid joins with a complementary strand through base pairing asknown in the art. A nucleic acid is considered to be “selectivelyhybridizable” to a reference nucleic acid sequence if the two sequencesspecifically hybridize to one another under moderate to high stringencyhybridization and wash conditions. Moderate and high stringencyhybridization conditions are known (see, e.g., Ausubel, et al., ShortProtocols in Molecular Biology, 3rd ed., Wiley & Sons 1995 and Sambrooket al., Molecular Cloning: A Laboratory Manual, Third Edition, 2001 ColdSpring Harbor, N.Y.). One example of high stringency conditions includeshybridization at about 42° C. in 50% formamide, 5×SSC, 5×Denhardt'ssolution, 0.5% SDS and 100 μg/ml denatured carrier DNA followed bywashing two times in 2×SSC and 0.5% SDS at room temperature and twoadditional times in 0.1×SSC and 0.5% SDS at 42° C.

The term “duplex,” or “duplexed,” as used herein, describes twocomplementary polynucleotides that are base-paired, i.e., hybridizedtogether.

The term “amplifying” as used herein refers to generating one or morecopies of a target nucleic acid, using the target nucleic acid as atemplate.

The term “genomic region,” as used herein, refers to a region of agenome, e.g., an animal or plant genome such as the genome of a human,monkey, rat, fish or insect or plant. In certain cases, anoligonucleotide used in the method described herein may be designedusing a reference genomic region, i.e., a genomic region of knownnucleotide sequence, e.g., a chromosomal region whose sequence isdeposited at NCBI's Genbank database or other databases, for example.

The term “genomic sequence,” as used herein, refers to a sequence thatoccurs in a genome. Because RNAs are transcribed from a genome, thisterm encompasses sequence that exist in the nuclear genome of anorganism, as well as sequences that are present in a cDNA copy of an RNA(e.g., an mRNA) transcribed from such a genome.

The term “genomic fragment,” as used herein, refers to a region of agenome, e.g., an animal or plant genome such as the genome of a human,monkey, rat, fish or insect or plant. A genomic fragment may be anentire chromosome, or a fragment of a chromosome. A genomic fragment maybe adapter ligated (in which case it has an adapter ligated to one orboth ends of the fragment, or to at least the 5 end of a molecule), ormay not be adapter ligated.

In certain cases, an oligonucleotide used in the method described hereinmay be designed using a reference genomic region, i.e., a genomic regionof known nucleotide sequence, e.g., a chromosomal region whose sequenceis deposited at NCBI's Genbank database or other databases, for example.Such an oligonucleotide may be employed in an assay that uses a samplecontaining a test genome, where the test genome contains a binding sitefor the oligonucleotide.

The term “ligating,” as used herein, refers to the enzymaticallycatalyzed joining of the terminal nucleotide at the 5′ end of a firstDNA molecule to the terminal nucleotide at the 3′ end of a second DNAmolecule.

If two nucleic acids are “complementary,” each base of one of thenucleic acids base pairs with corresponding nucleotides in the othernucleic acid. The term “complementary” and “perfectly complementary” areused synonymously herein.

The term “separating,” as used herein, refers to physical separation oftwo elements (e.g., by size or affinity, etc.) as well as degradation ofone element, leaving the other intact. For example, size exclusion canbe employed to separate nucleic acids, including cleaved targetedsequences.

In a cell, DNA usually exists in a double-stranded form, and as such,has two complementary strands of nucleic acid referred to herein as the“top” and “bottom” strands. In certain cases, complementary strands of achromosomal region may be referred to as “plus” and “minus” strands, the“first” and “second” strands, the “coding” and “noncoding” strands, the“Watson” and “Crick” strands or the “sense” and “antisense” strands. Theassignment of a strand as being a top or bottom strand is arbitrary anddoes not imply any particular orientation, function or structure. Untilthey become covalently linked, the first and second strands are distinctmolecules. For ease of description, the “top” and “bottom” strands of adouble-stranded nucleic acid in which the top and bottom strands havebeen covalently linked will still be described as the “top” and “bottom”strands. In other words, for the purposes of this disclosure, the topand bottom strands of a double-stranded DNA do not need to be separatedmolecules. The nucleotide sequences of the first strand of severalexemplary mammalian chromosomal regions (e.g., BACs, assemblies,chromosomes, etc.) is known, and may be found in NCBI's Genbankdatabase, for example.

The term “top strand,” as used herein, refers to either strand of anucleic acid but not both strands of a nucleic acid. When anoligonucleotide or a primer binds or anneals “only to a top strand,” itbinds to only one strand but not the other. The term “bottom strand,” asused herein, refers to the strand that is complementary to the “topstrand.” When an oligonucleotide binds or anneals “only to one strand,”it binds to only one strand, e.g., the first or second strand, but notthe other strand. If an oligonucleotide binds or anneals to both strandsof a double-stranded DNA, the oligonucleotide may have two regions, afirst region that hybridizes with the top strand of the double-strandedDNA, and a second region that hybridizes with the bottom strand of thedouble-stranded DNA.

The term “double-stranded DNA molecule” refers to both double-strandedDNA molecules in which the top and bottom strands are not covalentlylinked, as well as double-stranded DNA molecules in which the top andbottom stands are covalently linked. The top and bottom strands of adouble-stranded DNA are base paired with one other by Watson-Crickinteractions.

The term “denaturing,” as used herein, refers to the separation of atleast a portion of the base pairs of a nucleic acid duplex by placingthe duplex in suitable denaturing conditions. Denaturing conditions arewell known in the art. In one embodiment, in order to denature a nucleicacid duplex, the duplex may be exposed to a temperature that is abovethe T_(m) of the duplex, thereby releasing one strand of the duplex fromthe other. In certain embodiments, a nucleic acid may be denatured byexposing it to a temperature of at least 90° C. for a suitable amount oftime (e.g., at least 30 seconds, up to 30 mins). In certain embodiments,fully denaturing conditions may be used to completely separate the basepairs of the duplex. In other embodiments, partially denaturingconditions (e.g., with a lower temperature than fully denaturingconditions) may be used to separate the base pairs of certain parts ofthe duplex (e.g., regions enriched for A-T base pairs may separate whileregions enriched for G-C base pairs may remain paired). Nucleic acid mayalso be denatured chemically (e.g., using urea or NaOH).

The term “genotyping” as used herein, refers to any type of analysis ofa nucleic acid sequence, and includes sequencing, polymorphism (SNP)analysis, and analysis to identify rearrangements.

The term “sequencing,” as used herein, refers to a method by which theidentity of consecutive nucleotides of a polynucleotide are obtained.

The term “next-generation sequencing” refers to the so-calledparallelized sequencing-by-synthesis or sequencing-by-ligationplatforms, for example, those currently employed by Illumina, LifeTechnologies, and Roche, etc. Next-generation sequencing methods mayalso include nanopore sequencing methods or electronic-detection basedmethods such as Ion Torrent technology commercialized by LifeTechnologies.

The term “complementary DNA” or cDNA refers to a double-stranded DNAsample that was produced from an RNA sample by reverse transcription ofRNA (using primers such as random hexamers or oligo-dT primers) followedby second-strand synthesis by digestion of the RNA with RNaseH andsynthesis by DNA polymerase.

The term “RNA promoter adapter” is an adapter that contains a promoterfor a bacteriophage RNA polymerase, e.g., the RNA polymerase frombacteriophage T3, T7, SP6 or the like.

Other definitions of terms may appear throughout the specification.

For any of the structural and functional characteristics describedherein, methods of determining these characteristics are known in theart.

Guide Nucleic Acids (gNAs)

Provided herein are guide nucleic acids (gNAs) derivable from anynucleic acid source. The gNAs can be guide RNAs (gRNAs) or guide DNAs(gDNAs). The nucleic acid source can be DNA or RNA. Provided herein aremethods to generate gNAs from any source nucleic acid, including DNAfrom a single organism, or mixtures of DNA from multiple organisms, ormixtures of DNA from multiple species, or DNA from clinical samples, orDNA from forensic samples, or DNA from environmental samples, or DNAfrom metagenomic DNA samples (for example a sample that contains morethan one species of organism). Examples of any source DNA include, butare not limited to any genome, any genome fragment, cDNA, synthetic DNA,or a DNA collection (e.g. a SNP collection, DNA libraries). The gNAsprovided herein can be used for genome-wide applications.

In some embodiments, the gNAs are derived from genomic sequences (e.g.,genomic DNA). In some embodiments, the gNAs are derived from mammaliangenomic sequences. In some embodiments, the gNAs are derived fromeukaryotic genomic sequences. In some embodiments, the gNAs are derivedfrom prokaryotic genomic sequences. In some embodiments, the gNAs arederived from viral genomic sequences. In some embodiments, the gNAs arederived from bacterial genomic sequences. In some embodiments, the gNAsare derived from plant genomic sequences. In some embodiments, the gNAsare derived from microbial genomic sequences. In some embodiments, thegNAs are derived from genomic sequences from a parasite, for example aeukaryotic parasite.

In some embodiments, the gNAs are derived from repetitive DNA. In someembodiments, the gNAs are derived from abundant DNA. In someembodiments, the gNAs are derived from mitochondrial DNA. In someembodiments, the gNAs are derived from ribosomal DNA. In someembodiments, the gNAs are derived from centromeric DNA. In someembodiments, the gNAs are derived from DNA comprising Alu elements (AluDNA). In some embodiments, the gNAs are derived from DNA comprising longinterspersed nuclear elements (LINE DNA). In some embodiments, the gNAsare derived from DNA comprising short interspersed nuclear elements(SINE DNA). In some embodiments the abundant DNA comprises ribosomalDNA. In some embodiments, the abundant DNA comprises host DNA (e.g.,host genomic DNA or all host DNA). In an example, the gNAs can bederived from host DNA (e.g., human, animal, plant) for the depletion ofhost DNA to allow for easier analysis of other DNA that is present(e.g., bacterial, viral, or other metagenomic DNA). In another example,the gNAs can be derived from the one or more most abundant types (e.g.,species) in a mixed sample, such as the one or more most abundantbacteria species in a metagenomic sample. The one or more most abundanttypes (e.g., species) can comprise the two, three, four, five, six,seven, eight, nine, ten, or more than ten most abundant types (e.g.,species). The most abundant types can be the most abundant kingdoms,phyla or divisions, classes, orders, families, genuses, species, orother classifications. The most abundant types can be the most abundantcell types, such as epithelial cells, bone cells, muscle cells, bloodcells, adipose cells, or other cell types. The most abundant types canbe non-cancerous cells. The most abundant types can be cancerous cells.The most abundant types can be animal, human, plant, fungal, bacterial,or viral. gNAs can be derived from both a host and the one or more mostabundant non-host types (e.g., species) in a sample, such as from bothhuman DNA and the DNA of the one or more most abundant bacterialspecies. In some embodiments, the abundant DNA comprises DNA from themore abundant or most abundant cells in a sample. For example, for aspecific sample, the highly abundant cells can be extracted and theirDNA can be used to produce gNAs; these gNAs can be used to producedepletion library and applied to original sample to enable or enhancesequencing or detection of low abundance targets.

In some embodiments, the gNAs are derived from DNA comprising shortterminal repeats (STRs).

In some embodiments, the gNAs are derived from a genomic fragment,comprising a region of the genome, or the whole genome itself. In oneembodiment, the genome is a DNA genome. In another embodiment, thegenome is a RNA genome.

In some embodiments, the gNAs are derived from a eukaryotic orprokaryotic organism; from a mammalian organism or a non-mammalianorganism; from an animal or a plant; from a bacteria or virus; from ananimal parasite; from a pathogen.

In some embodiments, the gNAs are derived from any mammalian organism.In one embodiment the mammal is a human. In another embodiment themammal is a livestock animal, for example a horse, a sheep, a cow, apig, or a donkey. In another embodiment, a mammalian organism is adomestic pet, for example a cat, a dog, a gerbil, a mouse, a rat. Inanother embodiment the mammal is a type of a monkey.

In some embodiments, the gNAs are derived from any bird or avianorganism. An avian organism includes but is not limited to chicken,turkey, duck and goose.

In some embodiments, the gNAs are derived from a plant. In oneembodiment, the plant is rice, maize, wheat, rose, grape, coffee, fruit,tomato, potato, or cotton.

In some embodiments, the gNAs are derived from a species of bacteria. Inone embodiment, the bacteria are tuberculosis-causing bacteria.

In some embodiments, the gNAs are derived from a virus.

In some embodiments, the gNAs are derived from a species of fungi.

In some embodiments, the gNAs are derived from a species of algae.

In some embodiments, the gNAs are derived from any mammalian parasite.

In some embodiments, the gNAs are derived from any mammalian parasite.In one embodiment, the parasite is a worm. In another embodiment, theparasite is a malaria-causing parasite. In another embodiment, theparasite is a Leishmaniasis-causing parasite. In another embodiment, theparasite is an amoeba.

In some embodiments, the gNAs are derived from a nucleic acid target.Contemplated targets include, but are not limited to, pathogens; singlenucleotide polymorphisms (SNPs), insertions, deletions, tandem repeats,or translocations; human SNPs or STRs; potential toxins; or animals,fungi, and plants. In some embodiments, the gRNAs are derived frompathogens, and are pathogen-specific gNAs.

In some embodiments, a guide NA of the invention comprises a first NAsegment comprising a targeting sequence, wherein the targeting sequenceis 15-250 bp; and a second NA segment comprising a nucleic acid guidednuclease system (e.g., CRISPR/Cas system) protein-binding sequence. Insome embodiments, the targeting sequence is greater than 21 bp, greaterthan 22 bp, greater than 23 bp, greater than 24 bp, greater than 25 bp,greater than 26 bp, greater than 27 bp, greater than 28 bp, greater than29 bp, greater than 30 bp, greater than 40 bp, greater than 50 bp,greater than 60 bp, greater than 70 bp, greater than 80 bp, greater than90 bp, greater than 100 bp, greater than 110 bp, greater than 120 bp,greater than 130 bp, greater than 140 bp, or even greater than 150 bp.In an exemplary embodiment, the targeting sequence is greater than 30bp. In some embodiments, the targeting sequences of the presentinvention range in size from 30-50 bp. In some embodiments, targetingsequences of the present invention range in size from 30-75 bp. In someembodiments, targeting sequences of the present invention range in sizefrom 30-100 bp. For example, a targeting sequence can be at least 15 bp,20 bp, 25 bp, 30 bp, 35 bp, 40 bp, 45 bp, 50 bp, 55 bp, 60 bp, 65 bp, 70bp, 75 bp, 80 bp, 85 bp, 90 bp, 95 bp, 100 bp, 110 bp, 120 bp, 130 bp,140 bp, 150 bp, 160 bp, 170 bp, 180 bp, 190 bp, 200 bp, 210 bp, 220 bp,230 bp, 240 bp, or 250 bp. In specific embodiments, the targetingsequence is at least 22 bp. In specific embodiments, the targetingsequence is at least 30 bp.

In some embodiments, target-specific gNAs can comprise a nucleic acidsequence that is complementary to a region on the opposite strand of thetargeted nucleic acid sequence 5′ to a PAM sequence, which can berecognized by a nucleic acid-guided nuclease system (e.g., CRISPR/Cassystem) protein. In some embodiments the targeted nucleic acid sequenceis immediately 5′ to a PAM sequence. In specific embodiments, thenucleic acid sequence of the gNA that is complementary to a region in atarget nucleic acid is 15-250 bp. In specific embodiments, the nucleicacid sequence of the gNA that is complementary to a region in a targetnucleic acid is 22, 23, 24, 25, 30, 35, 40, 45, 50, 60, 70, 75, 80, 90,or 100 bp.

In some particular embodiments, the targeting sequence is not 20 bp. Insome particular embodiments, the targeting sequence is not 21 bp.

In some embodiments, the gNAs comprise any purines or pyrimidines(and/or modified versions of the same). In some embodiments, the gNAscomprise adenine, uracil, guanine, and cytosine (and/or modifiedversions of the same). In some embodiments, the gNAs comprise adenine,thymine, guanine, and cytosine (and/or modified versions of the same).In some embodiments, the gNAs comprise adenine, thymine, guanine,cytosine and uracil (and/or modified versions of the same).

In some embodiments, the gNAs comprise a label, are attached to a label,or are capable of being labeled. In some embodiments, the gNA comprisesis a moiety that is further capable of being attached to a label. Alabel includes, but is not limited to, enzyme, an enzyme substrate, anantibody, an antigen binding fragment, a peptide, a chromophore, alumiphore, a fluorophore, a chromogen, a hapten, an antigen, aradioactive isotope, a magnetic particle, a metal nanoparticle, a redoxactive marker group (capable of undergoing a redox reaction), anaptamer, one member of a binding pair, a member of a FRET pair (either adonor or acceptor fluorophore), and combinations thereof.

In some embodiments, the gNAs are attached to a substrate. The substratecan be made of glass, plastic, silicon, silica-based materials,functionalized polystyrene, functionalized polyethyleneglycol,functionalized organic polymers, nitrocellulose or nylon membranes,paper, cotton, and materials suitable for synthesis. Substrates need notbe flat. In some embodiments, the substrate is a 2-dimensional array. Insome embodiments, the 2-dimensional array is flat. In some embodiments,the 2-dimensional array is not flat, for example, the array is awave-like array. Substrates include any type of shape includingspherical shapes (e.g., beads). Materials attached to substrates may beattached to any portion of the substrates (e.g., may be attached to aninterior portion of a porous substrates material). In some embodiments,the substrate is a 3-dimensional array, for example, a microsphere. Insome embodiments, the microsphere is magnetic. In some embodiments, themicrosphere is glass. In some embodiments, the microsphere is made ofpolystyrene. In some embodiments, the microsphere is silica-based. Insome embodiments, the substrate is an array with interior surface, forexample, is a straw, tube, capillary, cylindrical, or microfluidicchamber array. In some embodiments, the substrate comprises multiplestraws, capillaries, tubes, cylinders, or chambers.

Nucleic Acids Encoding gNAs

Also provided herein are nucleic acids encoding for gNAs (e.g., gRNAs orgDNAs). In some embodiments, by encoding it is meant that a gNA resultsfrom the transcription of a nucleic acid encoding for a gNA (e.g.,gRNA). In some embodiments, by encoding, it is meant that the nucleicacid is a template for the transcription of a gNA (e.g., gRNA). In someembodiments, by encoding, it is meant that a gNA results from thereverse transcription of a nucleic acid encoding for a gNA. In someembodiments, by encoding, it is meant that the nucleic acid is atemplate for the reverse transcription of a gNA. In some embodiments, byencoding, it is meant that a gNA results from the amplification of anucleic acid encoding for a gNA. In some embodiments, by encoding, it ismeant that the nucleic acid is a template for the amplification of agNA.

In some embodiments the nucleic acid encoding for a gNA comprises afirst segment comprising a regulatory region; a second segmentcomprising targeting sequence, wherein the second segment can range from15 bp-250 bp; and a third segment comprising a nucleic acid encoding anucleic acid-guided nuclease system (e.g., CRISPR/Cas system)protein-binding sequence.

In some embodiments, the nucleic acids encoding for gNAs comprise DNA.In some embodiments, the first segment is double stranded DNA. In someembodiments, the first segment is single stranded DNA. In someembodiments, the second segment is single stranded DNA. In someembodiments, the third segment is single stranded DNA. In someembodiments, the second segment is double stranded DNA. In someembodiments, the third segment is double stranded DNA.

In some embodiments, the nucleic acids encoding for gNAs comprise RNA.

In some embodiments the nucleic acids encoding for gNAs comprise DNA andRNA.

In some embodiments, the regulatory region is a region capable ofbinding a transcription factor. In some embodiments, the regulatoryregion comprises a promoter. In some embodiments, the promoter isselected from the group consisting of T7, SP6, and T3.

Collections of gNAs

Provided herein are collections (interchangeably referred to aslibraries) of gNAs.

As used herein, a collection of gNAs denotes a mixture of gNAscontaining at least 102 unique gNAs. In some embodiments a collection ofgNAs contains at least 10², at least 10³, at least 10⁴, at least 10⁵, atleast 10⁶, at least 10⁷, at least 10⁸, at least 10⁹, at least 10¹⁰unique gNAs. In some embodiments a collection of gNAs contains a totalof at least 10², at least 10³, at least 10⁴, at least 10⁵, at least 10⁶,at least 10⁷, at least 10⁸, at least 10⁹, at least 10¹⁰ gNAs.

In some embodiments, a collection of gNAs comprises a first NA segmentcomprising a targeting sequence; and a second NA segment comprising anucleic acid-guided nuclease system (e.g., CRISPR/Cas system)protein-binding sequence, wherein at least 10% of the gNAs in thecollection vary in size. In some embodiments, the first and secondsegments are in 5′- to 3-order′.

In some embodiments, the size of the first segment varies from 15-250bp, or 30-100 bp, or 22-30 bp, or 15-50 bp, or 15-75 bp, or 15-100 bp,or 15-125 bp, or 15-150 bp, or 15-175 bp, or 15-200 bp, or 15-225 bp, or15-250 bp, or 22-50 bp, or 22-75 bp, or 22-100 bp, or 22-125 bp, or22-150 bp, or 22-175 bp, or 22-200 bp, or 22-225 bp, or 22-250 bp acrossthe collection of gNAs.

In some embodiments, at least 10%, or at least 15%, or at last 20%, orat least 25%, or at least 30%, or at least 35%, or at least 40%, or atleast 45%, or at least 50%, or at least 55%, or at least 60%, or atleast 65%, or at least 70%, or at least 75%, or at least 80%, or atleast 85%, or at least 90%, or at least 95%, or 100% of the firstsegments in the collection are greater than 21 bp.

In some embodiments, at least 10%, or at least 15%, or at last 20%, orat least 25%, or at least 30%, or at least 35%, or at least 40%, or atleast 45%, or at least 50%, or at least 55%, or at least 60%, or atleast 65%, or at least 70%, or at least 75%, or at least 80%, or atleast 85%, or at least 90%, or at least 95%, or 100% of the firstsegments in the collection are greater than 25 bp.

In some embodiments, at least 10%, or at least 15%, or at last 20%, orat least 25%, or at least 30%, or at least 35%, or at least 40%, or atleast 45%, or at least 50%, or at least 55%, or at least 60%, or atleast 65%, or at least 70%, or at least 75%, or at least 80%, or atleast 85%, or at least 90%, or at least 95%, or 100% of the firstsegments in the collection are greater than 30 bp.

In some embodiments, at least 10%, or at least 15%, or at last 20%, orat least 25%, or at least 30%, or at least 35%, or at least 40%, or atleast 45%, or at least 50%, or at least 55%, or at least 60%, or atleast 65%, or at least 70%, or at least 75%, or at least 80%, or atleast 85%, or at least 90%, or at least 95%, or 100% of the firstsegments in the collection are 15-50 bp.

In some embodiments, at least 0%, or at least 15%, or at last 20%, or atleast 25%, or at least 30%, or at least 35%, or at least 40%, or atleast 45%, or at least 50%, or at least 55%, or at least 60%, or atleast 65%, or at least 70%, or at least 75%, or at least 80%, or atleast 85%, or at least 90%, or at least 95%, or 100% of the firstsegments in the collection are 30-100 bp.

In some particular embodiments, the size of the first segment is not 20bp.

In some particular embodiments, the size of the first segment is not 21bp.

In some embodiments, the gNAs and/or the targeting sequence of the gNAsin the collection of gRNAs comprise unique 5′ ends. In some embodiments,the collection of gNAs exhibit variability in sequence of the 5′ end ofthe targeting sequence, across the members of the collection. In someembodiments, the collection of gNAs exhibit variability at least 5%, orat least 10%, or at least 15%, or at last 20%, or at least 25%, or atleast 30%, or at least 35%, or at least 40%, or at least 45%, or atleast 50%, or at least 55%, or at least 60%, or at least 65%, or atleast 70%, or at least 75% variability in the sequence of the 5′ end ofthe targeting sequence, across the members of the collection.

In some embodiments, the 3′ end of the gNA targeting sequence can be anypurine or pyrimidine (and/or modified versions of the same). In someembodiments, the 3′ end of the gNA targeting sequence is an adenine. Insome embodiments, the 3′ end of the gNA targeting sequence is a guanine.In some embodiments, the 3′ end of the gNA targeting sequence is acytosine. In some embodiments, the 3′ end of the gNA targeting sequenceis a uracil. In some embodiments, the 3′ end of the gNA targetingsequence is a thymine. In some embodiments, the 3′ end of the gNAtargeting sequence is not cytosine.

In some embodiments, the collection of gNAs comprises targetingsequences which can base-pair with the targeted DNA, wherein the targetof interest is spaced at least every 1 bp, at least every 2 bp, at leastevery 3 bp, at least every 4 bp, at least every 5 bp, at least every 6bp, at least every 7 bp, at least every 8 bp, at least every 9 bp, atleast every 10 bp, at least every 11 bp, at least every 12 bp, at leastevery 13 bp, at least every 14 bp, at least every 15 bp, at least every16 bp, at least every 17 bp, at least every 18 bp, at least every 19 bp,20 bp, at least every 25 bp, at least every 30 bp, at least every 40 bp,at least every 50 bp, at least every 100 bp, at least every 200 bp, atleast every 300 bp, at least every 400 bp, at least every 500 bp, atleast every 600 bp, at least every 700 bp, at least every 800 bp, atleast every 900 bp, at least every 1000 bp, at least every 2500 bp, atleast every 5000 bp, at least every 10,000 bp, at least every 15,000 bp,at least every 20,000 bp, at least every 25,000 bp, at least every50,000 bp, at least every 100,000 bp, at least every 250,000 bp, atleast every 500,000 bp, at least every 750,000 bp, or even at leastevery 1,000,000 bp across a genome of interest.

In some embodiments, the collection of gNAs comprises a first NA segmentcomprising a targeting sequence; and a second NA segment comprising anucleic acid-guided nuclease system (e.g., CRISPR/Cas system)protein-binding sequence, wherein the gNAs in the collection can have avariety of second NA segments with various specificities for proteinmembers of the nucleic acid-guided nuclease system (e.g., CRISPR/Cassystem). For example a collection of gNAs as provided herein, cancomprise members whose second segment comprises a nucleic acid-guidednuclease system (e.g., CRISPR/Cas system) protein-binding sequencespecific for a first nucleic acid-guided nuclease system (e.g.,CRISPR/Cas system) protein; and also comprises members whose secondsegment comprises a nucleic acid-guided nuclease system (e.g.,CRISPR/Cas system) protein-binding sequence specific for a secondnucleic acid-guided nuclease system (e.g., CRISPR/Cas system) protein,wherein the first and second nucleic acid-guided nuclease system (e.g.,CRISPR/Cas system) proteins are not the same. In some embodiments acollection of gNAs as provided herein comprises members that exhibitspecificity to at least 1, at least 2, at least 3, at least 4, at least5, at least 6, at least 7, at least 8, at least 9, at least 10, at least11, at least 12, at least 13, at least 14, at least 15, at least 16, atleast 17, at least 18, at least 19, or even at least 20 nucleicacid-guided nuclease system (e.g., CRISPR/Cas system) proteins. In onespecific embodiment, a collection of gNAs as provided herein comprisesmembers that exhibit specificity for a Cas9 protein and another proteinselected from the group consisting of Cpf1, Cas3, Cas8a-c, Cas10, Cse1,Csy1, Csn2, Cas4, Csm2, and Cm5.

In some embodiments, a plurality of the gNA members of the collectionare attached to a label, comprise a label or are capable of beinglabeled. In some embodiments, the gNA comprises is a moiety that isfurther capable of being attached to a label. A label includes, but isnot limited to, enzyme, an enzyme substrate, an antibody, an antigenbinding fragment, a peptide, a chromophore, a lumiphore, a fluorophore,a chromogen, a hapten, an antigen, a radioactive isotope, a magneticparticle, a metal nanoparticle, a redox active marker group (capable ofundergoing a redox reaction), an aptamer, one member of a binding pair,a member of a FRET pair (either a donor or acceptor fluorophore), andcombinations thereof.

In some embodiments, a plurality of the gNA members of the collectionare attached to a substrate. The substrate can be made of glass,plastic, silicon, silica-based materials, functionalized polystyrene,functionalized polyethyleneglycol, functionalized organic polymers,nitrocellulose or nylon membranes, paper, cotton, and materials suitablefor synthesis. Substrates need not be flat. In some embodiments, thesubstrate is a 2-dimensional array. In some embodiments, the2-dimensional array is flat. In some embodiments, the 2-dimensionalarray is not flat, for example, the array is a wave-like array.Substrates include any type of shape including spherical shapes (e.g.,beads). Materials attached to substrates may be attached to any portionof the substrates (e.g., may be attached to an interior portion of aporous substrates material). In some embodiments, the substrate is a3-dimensional array, for example, a microsphere. In some embodiments,the microsphere is magnetic. In some embodiments, the microsphere isglass. In some embodiments, the microsphere is made of polystyrene. Insome embodiments, the microsphere is silica-based. In some embodiments,the substrate is an array with interior surface, for example, is astraw, tube, capillary, cylindrical, or microfluidic chamber array. Insome embodiments, the substrate comprises multiple straws, capillaries,tubes, cylinders, or chambers.

Collections of Nucleic Acids Encoding % NAs

Provided herein are collections (interchangeably referred to aslibraries) of nucleic acids encoding for gNAs (e.g., gRNAs or gDNAs). Insome embodiments, by encoding it is meant that a gNA results from thetranscription of a nucleic acid encoding for a gNA. In some embodiments,by encoding, it is meant that the nucleic acid is a template for thetranscription of a gNA.

As used herein, a collection of nucleic acids encoding for gNAs denotesa mixture of nucleic acids containing at least 10² unique nucleic acids.In some embodiments a collection of nucleic acids encoding for gNAscontains at least 10², at least 10³, at least 10⁴, at least 10⁵, atleast 10⁶, at least 10⁷, at least 10⁸, at least 10⁹, at least 10¹⁰unique nucleic acids encoding for gNAs. In some embodiments a collectionof nucleic acids encoding for gNAs contains a total of at least 10², atleast 10³, at least 10⁴, at least 10⁵, at least 10⁶, at least 10⁷, atleast 10⁸, at least 10⁹, at least 10¹⁰ nucleic acids encoding for gNAs.

In some embodiments, a collection of nucleic acids encoding for gNAscomprises a first segment comprising a regulatory region; a secondsegment comprising a targeting sequence; and a third segment comprisinga nucleic acid encoding a nucleic acid-guided nuclease system (e.g.,CRISPR/Cas system) protein-binding sequence, wherein at least 10% of thenucleic acids in the collection vary in size.

In some embodiments, the first, second, and third segments are in 5′- to3′-order′.

In some embodiments, the nucleic acids encoding for gNAs comprise DNA.In some embodiments, the first segment is single stranded DNA. In someembodiments, the first segment is double stranded DNA. In someembodiments, the second segment is single stranded DNA. In someembodiments, the third segment is single stranded DNA. In someembodiments, the second segment is double stranded DNA. In someembodiments, the third segment is double stranded DNA.

In some embodiments, the nucleic acids encoding for gNAs comprise RNA.

In some embodiments the nucleic acids encoding for gNAs comprise DNA andRNA.

In some embodiments, the regulatory region is a region capable ofbinding a transcription factor. In some embodiments, the regulatoryregion comprises a promoter. In some embodiments, the promoter isselected from the group consisting of T7, SP6, and T3.

In some embodiments, the size of the second segments (targetingsequence) in the collection varies from 15-250 bp, or 30-100 bp, or22-30 bp, or 15-50 bp, or 15-75 bp, or 15-100 bp, or 15-125 bp, or15-150 bp, or 15-175 bp, or 15-200 bp, or 15-225 bp, or 15-250 bp, or22-50 bp, or 22-75 bp, or 22-100 bp, or 22-125 bp, or 22-150 bp, or22-175 bp, or 22-200 bp, or 22-225 bp, or 22-250 bp across thecollection of gNAs.

In some embodiments, at least 10%, or at least 15%, or at last 20%, orat least 25%, or at least 30%, or at least 35%, or at least 40%, or atleast 45%, or at least 50%, or at least 55%, or at least 60%, or atleast 65%, or at least 70%, or at least 75%, or at least 80%, or atleast 85%, or at least 90%, or at least 95%, or 100% of the secondsegments in the collection are greater than 21 bp.

In some embodiments, at least 10%, or at least 15%, or at last 20%, orat least 25%, or at least 30%, or at least 35%, or at least 40%, or atleast 45%, or at least 50%, or at least 55%, or at least 60%, or atleast 65%, or at least 70%, or at least 75%, or at least 80%, or atleast 85%, or at least 90%, or at least 95%, or 100% of the secondsegments in the collection are greater than 25 bp.

In some embodiments, at least 10%, or at least 15%, or at last 20%, orat least 25%, or at least 30%, or at least 35%, or at least 40%, or atleast 45%, or at least 50%, or at least 55%, or at least 60%, or atleast 65%, or at least 70%, or at least 75%, or at least 80%, or atleast 85%, or at least 90%, or at least 95%, or 100% of the secondsegments in the collection are greater than 30 bp.

In some embodiments, at least 10%, or at least 15%, or at last 20%, orat least 25%, or at least 30%, or at least 35%, or at least 40%, or atleast 45%, or at least 50%, or at least 55%, or at least 60%, or atleast 65%, or at least 70%, or at least 75%, or at least 80%, or atleast 85%, or at least 90%, or at least 95%, or 100% of the secondsegments in the collection are 15-50 bp.

In some embodiments, at least 10%, or at least 15%, or at last 20%, orat least 25%, or at least 30%, or at least 35%, or at least 40%, or atleast 45%, or at least 50%, or at least 55%, or at least 60%, or atleast 65%, or at least 70%, or at least 75%, or at least 80%, or atleast 85%, or at least 90%, or at least 95%, or 100% of the secondsegments in the collection are 30-100 bp.

In some particular embodiments, the size of the second segment is not 20bp.

In some particular embodiments, the size of the second segment is not 21bp.

In some embodiments, the gNAs and/or the targeting sequence of the gNAsin the collection of gNAs comprise unique 5′ ends. In some embodiments,the collection of gNAs exhibit variability in sequence of the 5′ end ofthe targeting sequence, across the members of the collection. In someembodiments, the collection of gNAs exhibit variability at least 5%, orat least 10%, or at least 15%, or at last 20%, or at least 25%, or atleast 30%, or at least 35%, or at least 40%, or at least 45%, or atleast 50%, or at least 55%, or at least 60%, or at least 65%, or atleast 70%, or at least 75% variability in the sequence of the 5′ end ofthe targeting sequence, across the members of the collection.

In some embodiments, the collection of nucleic acids comprises targetingsequences, wherein the target of interest is spaced at least every 1 bp,at least every 2 bp, at least every 3 bp, at least every 4 bp, at leastevery 5 bp, at least every 6 bp, at least every 7 bp, at least every 8bp, at least every 9 bp, at least every 10 bp, at least every 11 bp, atleast every 12 bp, at least every 13 bp, at least every 14 bp, at leastevery 15 bp, at least every 16 bp, at least every 17 bp, at least every18 bp, at least every 19 bp, 20 bp, at least every 25 bp, at least every30 bp, at least every 40 bp, at least every 50 bp, at least every 100bp, at least every 200 bp, at least every 300 bp, at least every 400 bp,at least every 500 bp, at least every 600 bp, at least every 700 bp, atleast every 800 bp, at least every 900 bp, at least every 1000 bp, atleast every 2500 bp, at least every 5000 bp, at least every 10,000 bp,at least every 15,000 bp, at least every 20,000 bp, at least every25,000 bp, at least every 50,000 bp, at least every 100,000 bp, at leastevery 250,000 bp, at least every 500,000 bp, at least every 750,000 bp,or even at least every 1,000,000 bp across a genome of interest.

In some embodiments, the collection of nucleic acids encoding for gNAscomprise a third segment encoding for a nucleic acid-guided nucleasesystem (e.g., CRISPR/Cas system) protein-binding sequence, wherein thesegments in the collection vary in their specificity for protein membersof the nucleic acid-guided nuclease system (e.g., CRISPR/Cas system).For example, a collection of nucleic acids encoding for gNAs as providedherein, can comprise members whose third segment encode for a nucleicacid-guided nuclease system (e.g., CRISPR/Cas system) protein-bindingsequence specific for a first nucleic acid-guided nuclease system (e.g.,CRISPR/Cas system) protein; and also comprises members whose thirdsegment encodes for a nucleic acid-guided nuclease system (e.g.,CRISPR/Cas system) protein-binding sequence specific for a secondnucleic acid-guided nuclease system (e.g., CRISPR/Cas system) protein,wherein the first and second nucleic acid-guided nuclease system (e.g.,CRISPR/Cas system) proteins are not the same. In some embodiments, acollection of nucleic acids encoding for gNAs as provided hereincomprises members that exhibit specificity to at least 1, at least 2, atleast 3, at least 4, at least 5, at least 6, at least 7, at least 8, atleast 9, at least 10, at least 11, at least 12, at least 13, at least14, at least 15, at least 16, at least 17, at least 18, at least 19, oreven at least 20 nucleic acid-guided nuclease system (e.g., CRISPR/Cassystem) proteins. In one specific embodiment, a collection of nucleicacids encoding for gNAs as provided herein comprises members thatexhibit specificity for a Cas9 protein and another protein selected fromthe group consisting of Cpf1, Cas3, Cas8a-c, Cas10, Cse1, Csy1, Csn2,Cas4, Csm2, and Cm5.

Sequences of Interest

Provided herein are gNAs and collections of gNAs, derived from anysource DNA (for example from genomic DNA, cDNA, artificial DNA, DNAlibraries), that can be used to target sequences of interest in a samplefor a variety of applications including, but not limited to, enrichment,depletion, capture, partitioning, labeling, regulation, and editing. ThegNAs comprise a targeting sequence, directed at sequences of interest.

In some embodiments, the sequences of interest are genomic sequences(genomic DNA). In some embodiments, the sequences of interest aremammalian genomic sequences. In some embodiments, the sequences ofinterest are eukaryotic genomic sequences. In some embodiments, thesequences of interest are prokaryotic genomic sequences. In someembodiments, the sequences of interest are viral genomic sequences. Insome embodiments, the sequences of interest are bacterial genomicsequences. In some embodiments, the sequences of interest are plantgenomic sequences. In some embodiments, the sequences of interest aremicrobial genomic sequences. In some embodiments, the sequences ofinterest are genomic sequences from a parasite, for example a eukaryoticparasite. In some embodiments, the sequences of interest are hostgenomic sequences (e.g., the host organism of a microbiome, a parasite,or a pathogen). In some embodiments, the sequences of interest areabundant genomic sequences, such as sequences from the genome or genomesof the most abundant species in a sample.

In some embodiments, the sequences of interest comprise repetitive DNA.In some embodiments, the sequences of interest comprise abundant DNA. Insome embodiments, the sequences of interest comprise mitochondrial DNA.In some embodiments, the sequences of interest comprise ribosomal DNA.In some embodiments, the sequences of interest comprise centromeric DNA.In some embodiments, the sequences of interest comprise DNA comprisingAlu elements (Alu DNA). In some embodiments, the sequences of interestcomprise long interspersed nuclear elements (LINE DNA). In someembodiments, the sequences of interest comprise short interspersednuclear elements (SINE DNA). In some embodiments, the abundant DNAcomprises ribosomal DNA.

In some embodiments, the sequences of interest comprise singlenucleotide polymorphisms (SNPs), short tandem repeats (STRs), cancergenes, inserts, deletions, structural variations, exons, geneticmutations, or regulatory regions.

In some embodiments, the sequences of interest can be a genomicfragment, comprising a region of the genome, or the whole genome itself.In one embodiment, the genome is a DNA genome. In another embodiment,the genome is a RNA genome.

In some embodiments, the sequences of interest are from a eukaryotic orprokaryotic organism; from a mammalian organism or a non-mammalianorganism; from an animal or a plant; from a bacteria or virus from ananimal parasite; from a pathogen.

In some embodiments, the sequences of interest are from any mammalianorganism. In one embodiment the mammal is a human. In another embodimentthe mammal is a livestock animal, for example a horse, a sheep, a cow, apig, or a donkey. In another embodiment, a mammalian organism is adomestic pet, for example a cat, a dog, a gerbil, a mouse, a rat. Inanother embodiment the mammal is a type of a monkey.

In some embodiments, the sequences of interest are from any bird oravian organism. An avian organism includes but is not limited tochicken, turkey, duck and goose.

In some embodiments, the sequences of interest are from a plant. In oneembodiment, the plant is rice, maize, wheat, rose, grape, coffee, fruit,tomato, potato, or cotton.

In some embodiments, the sequences of interest are from a species ofbacteria. In one embodiment, the bacteria are tuberculosis-causingbacteria.

In some embodiments, the sequences of interest are from a virus.

In some embodiments, the sequences of interest are from a species offungi.

In some embodiments, the sequences of interest are from a species ofalgae.

In some embodiments, the sequences of interest are from any mammalianparasite.

In some embodiments, the sequences of interest are obtained from anymammalian parasite. In one embodiment, the parasite is a worm. Inanother embodiment, the parasite is a malaria-causing parasite. Inanother embodiment, the parasite is a Leishmaniasis-causing parasite. Inanother embodiment, the parasite is an amoeba.

In some embodiments, the sequences of interest are from a pathogen.

Targeting Sequences

As used herein, a targeting sequence is one that directs the gNA to thesequences of interest in a sample. For example, a targeting sequencetargets a particular sequence of interest, for example the targetingsequence targets a genomic sequence of interest.

Provided herein are gNAs and collections of gNAs that comprise a segmentthat comprises a targeting sequence. Also provided herein, are nucleicacids encoding for gNAs, and collections of nucleic acids encoding forgNAs that comprise a segment encoding for a targeting sequence.

In some embodiments, the targeting sequence comprises DNA.

In some embodiments, the targeting sequence comprises RNA.

In some embodiments, the targeting sequence comprises RNA, and shares atleast 70% sequence identity, at least 75% sequence identity, at least80% sequence identity, at least 85% sequence identity, at least 90%sequence identity, at least 95% sequence identity, or shares 100%sequence identity to a sequence 5′ to a PAM sequence on a sequence ofinterest, except that the RNA comprises uracils instead of thymines. Insome embodiments, the PAM sequence is AGG, CGG, or TGG.

In some embodiments, the targeting sequence comprises DNA, and shares atleast 70% sequence identity, at least 75% sequence identity, at least80% sequence identity, at least 85% sequence identity, at least 90%sequence identity, at least 95% sequence identity, or shares 100%sequence identity to a sequence 5′ to a PAM sequence on a sequence ofinterest.

In some embodiments, the targeting sequence comprises RNA and iscomplementary to the strand opposite to a sequence of nucleotides 5′ toa PAM sequence. In some embodiments, the targeting sequence is at least70% complementary, at least 75% complementary, at least 80%complementary, at least 85% complementary, at least 90% complementary,at least 95% complementary, or is 100% complementary to the strandopposite to a sequence of nucleotides 5′ to a PAM sequence. In someembodiments, the PAM sequence is AGG, CGG, or TGG.

In some embodiments, the targeting sequence comprises DNA and iscomplementary to the strand opposite to a sequence of nucleotides 5′ toa PAM sequence. In some embodiments, the targeting sequence is at least70% complementary, at least 75% complementary, at least 80%complementary, at least 85% complementary, at least 90% complementary,at least 95% complementary, or is 100% complementary to the strandopposite to a sequence of nucleotides 5′ to a PAM sequence. In someembodiments, the PAM sequence is AGG, CGG, or TGG.

In some embodiments, a DNA encoding for a targeting sequence of a gRNAshares at least 70% sequence identity, at least 75% sequence identity,at least 80% sequence identity, at least 85% sequence identity, at least90% sequence identity, at least 95% sequence identity, or shares 100sequence identity to the strand opposite to a sequence of nucleotides 5′to a PAM sequence. In some embodiments, the PAM sequence is AGG, CGG, orTGG.

In some embodiments, a DNA encoding for a targeting sequence of a gRNAis complementary to the strand opposite to a sequence of nucleotides 5′to a PAM sequence and is at least 70% complementary, at least 75%complementary, at least 80% complementary, at least 85% complementary,at least 90% complementary, at least 95% complementary, or is 100%complementary to a sequence 5′ to a PAM sequence on a sequence ofinterest. In some embodiments, the PAM sequence is AGG, CGG, or TGG.

Nucleic Acid-Guided Nuclease System Proteins

Provided herein are gNAs and collections of gNAs comprising a segmentthat comprises a nucleic acid-guided nuclease system (e.g., CRISPR/Cassystem) protein-binding sequence. Also provided herein, are nucleicacids encoding for gNAs, and collections of nucleic acids encoding forgNAs that comprise a segment encoding a nucleic acid-guided nucleasesystem (e.g., CRISPR/Cas system) protein-binding sequence. A nucleicacid-guided nuclease system can be an RNA-guided nuclease system. Anucleic acid-guided nuclease system can be a DNA-guided nuclease system.

Methods of the present disclosure can utilize nucleic acid-guidednucleases. As used herein, a “nucleic acid-guided nuclease” is anynuclease that cleaves DNA, RNA or DNA/RNA hybrids, and which uses one ormore nucleic acid guide nucleic acids (gNAs) to confer specificity.Nucleic acid-guided nucleases include CRISPR/Cas system proteins as wellas non-CRISPR/Cas system proteins.

The nucleic acid-guided nucleases provided herein can be DNA guided DNAnucleases; DNA guided RNA nucleases; RNA guided DNA nucleases; or RNAguided RNA nucleases. The nucleases can be endonucleases. The nucleasescan be exonucleases. In one embodiment, the nucleic acid-guided nucleaseis a nucleic acid-guided-DNA endonuclease. In one embodiment, thenucleic acid-guided nuclease is a nucleic acid-guided-RNA endonuclease.

A nucleic acid-guided nuclease system protein-binding sequence is anucleic acid sequence that binds any protein member of a nucleicacid-guided nuclease system. For example, a CRISPR/Cas systemprotein-binding sequence is a nucleic acid sequence that binds anyprotein member of a CRISPR/Cas system.

In some embodiments, the nucleic acid-guided nuclease is selected fromthe group consisting of CAS Class I Type 1, CAS Class I Type III, CASClass 1 Type IV, CAS Class II Type 11, and CAS Class 11 Type V. In someembodiments, CRISPR/Cas system proteins include proteins from CRISPRType I systems, CRISPR Type II systems, and CRISPR Type III systems. Insome embodiments, the nucleic acid-guided nuclease is selected from thegroup consisting of Cas9, Cpf1, Cas3, Cas8a-c, Cas10, Cse1, Csy1, Csn2,Cas4, Csm2, Cm5, Csf1, C2c2, and NgAgo.

In some embodiments, nucleic acid-guided nuclease system proteins (e.g.,CRISPR/Cas system proteins) can be from any bacterial or archaealspecies.

In some embodiments, the nucleic acid-guided nuclease system proteins(e.g., CRISPR/Cas system proteins) are from, or are derived from nucleicacid-guided nuclease system proteins (e.g., CRISPR/Cas system proteins)from Streptococcus pyogenes, Staphylococcus aureus, Neisseriameningitidis, Streptococcus thermophiles, Treponema denticola,Francisella tularensis, Pasteurella multocida, Campylobacter jejuni,Campylobacter lari, Mycoplasma gallisepticum, Nitratifractor salsuginis,Parvibaculum lavamentivorans, Roseburia intestinalis, Neisseria cinerea,Gluconacetobacter diazotrophicus, Azospirillum, Sphaerochaeta globus,Flavobacterium columnare, Fluviicola taffensis, Bacteroides coprophilus,Mycoplasma mobile, Lactobacillus farciminis, Streptococcus pasteurianus,Lactobacillus johnsonii, Staphylococcus pseudintermedius, Filifactoralocis, Legionella pneumophila, Suterella wadsworthensis, orCorynebacter diphtheria.

In some embodiments, examples of nucleic acid-guided nuclease system(e.g., CRISPR/Cas system) proteins can be naturally occurring orengineered versions.

In some embodiments, naturally occurring nucleic acid-guided nucleasesystem (e.g., CRISPR/Cas system) proteins include Cas9, Cpf1, Cas3,Cas8a-c, Cas10, Cse1, Csy1, Csn2, Cas4, Csm2, and Cm5. Engineeredversions of such proteins can also be employed.

In some embodiments, engineered examples of nucleic acid-guided nucleasesystem (e.g., CRISPR/Cas system) proteins include catalytically deadnucleic acid-guided nuclease system proteins. The term “catalyticallydead” generally refers to a nucleic acid-guided nuclease system proteinthat has inactivated nucleases (e.g., HNH and RuvC nucleases). Such aprotein can bind to a target site in any nucleic acid (where the targetsite is determined by the guide NA), but the protein is unable to cleaveor nick the target nucleic acid (e.g., double-stranded DNA). In someembodiments, the nucleic acid-guided nuclease system catalytically deadprotein is a catalytically dead CRISPR/Cas system protein, such ascatalytically dead Cas9 (dCas9). Accordingly, the dCas9 allowsseparation of the mixture into unbound nucleic acids and dCas9-boundfragments. In one embodiment, a dCas9/gRNA complex binds to targetsdetermined by the gRNA sequence. The dCas9 bound can prevent cutting byCas9 while other manipulations proceed. In another embodiment, the dCas9can be fused to another enzyme, such as a transposase, to target thatenzyme's activity to a specific site. Naturally occurring catalyticallydead nucleic acid-guided nuclease system proteins can also be employed.

In some embodiments, engineered examples of nucleic acid-guided nuclease(e.g., CRISPR/Cas) system proteins also include nucleic acid-guidednickases (e.g., Cas nickases). A nucleic acid-guided nickase refers to amodified version of a nucleic acid-guided nuclease system protein,containing a single inactive catalytic domain. In one embodiment, thenucleic acid-guided nickase is a Cas nickase, such as Cas9 nickase. ACas9 nickase may contain a single inactive catalytic domain, forexample, either the RuvC- or the HNH-domain. With only one activenuclease domain, the Cas9 nickase cuts only one strand of the targetDNA, creating a single-strand break or “nick”. Depending on which mutantis used, the guide NA-hybridized strand or the non-hybridized strand maybe cleaved. Nucleic acid-guided nickases bound to 2 gNAs that targetopposite strands will create a double-strand break in a targetdouble-stranded DNA. This “dual nickase” strategy can increase thespecificity of cutting because it requires that both nucleic acid-guidednuclease/gNA (e.g., Cas9/gRNA) complexes be specifically bound at a sitebefore a double-strand break is formed. Naturally occurring nickasenucleic acid-guided nuclease system proteins can also be employed.

In some embodiments, engineered examples of nucleic acid-guided nucleasesystem proteins also include nucleic acid-guided nuclease system fusionproteins. For example, a nucleic acid-guided nuclease (e.g., CRISPR/Cas)system protein may be fused to another protein, for example anactivator, a repressor, a nuclease, a fluorescent molecule, aradioactive tag, or a transposase.

In some embodiments, the nucleic acid-guided nuclease systemprotein-binding sequence comprises a gNA (e.g., gRNA) stem-loopsequence.

In some embodiments, a double-stranded DNA sequence encoding the gNA(e.g., gRNA) stem-loop sequence comprises the following DNA sequence onone strand (5′>3′,GTTTTAGAGCTAGAAATAGCAAGTTAAAATAAGGCTAGTCCGTTATCAACTTGAAAAAGTGGCACCGAGTCGGTGCTTTTTTT) (SEQ ID NO: 3), and its reverse-complementary DNA onthe other strand (5′>3′,AAAAAAAGCACCGACTCGGTGCCACTTTTTCAAGTTGATAACGGACTAGCCTTATTTTAACTTGCTATTTCTAGCTCTAAAAC) (SEQ ID NO: 4).

In some embodiments, a single-stranded DNA sequence encoding the gNA(e.g., gRNA) stem-loop sequence comprises the following DNA sequence:(5′>3′,AAAAAAAGCACCGACTCGGTGCCACTTTTTCAAGTTGATAACGGACTAGCCTTATTTAACTTGCTATTTCTAGCTCTAAAAC) (SEQ ID NO: 4), wherein the single-stranded DNA servesas a transcription template.

In some embodiments, the gNA (e.g., gRNA) stem-loop sequence comprisesthe following RNA sequence: (5′>3′,GUUUUAGAGCUAGAAAUAGCAAGUUAAAAUAAGGCUAGUCCGUUAUCAACUUGAAAAAGUGGCACCGAGUCGGUGCUUUUUUU) (SEQ ID NO: 1)

In some embodiments, a double-stranded DNA sequence encoding the gNA(e.g., gRNA) stem-loop sequence comprises the following DNA sequence onone strand (5′>3′,GTTTTAGAGCTATGCTGGAAACAGCATAGCAAGTTAAAATAAGGCTAGTCCGTTATCAACTTGAAAAAGTGGCACCGAGTCGGTGCTTTTTTTC) (SEQ ID NO: 5), and itsreverse-complementary DNA on the other strand (5′>3′,GAAAAAAAGCACCGACTCGGTGCCAITTCAAGTTGATAACGGACTAGCCTTATTTTAACTTGCTATGCTGTTTCCAGCATAGCTCTAAAAC) (SEQ ID NO: 6).

In some embodiments, a single-stranded DNA sequence encoding the gNA(e.g., gRNA) stem-loop sequence comprises the following DNA sequence:(5′>3′,GAAAAAAAGCACCGACTCGGTGCCACTTTTTCAAGTTGATAACGGACTAGCCTTATTTTAACTTGCTATGCTGTTTCCAGCATAGCTCTAAAAC) (SEQ ID NO: 6), wherein the single-strandedDNA serves as a transcription template.

In some embodiments, the gNA (e.g., gRNA) stem-loop sequence comprisesthe following RNA sequence: (5′>3′,GUUUUAGAGCUAUGCUGGAAACAGCAUAGCAAGUUAAAAUAAGGCUAGUCCGUUAUCAACUUGAAAAAGUGGCACCGAGUCGGUGCUUUUUUUC) (SEQ ID NO: 2).

In some embodiments, provided herein is a nucleic acid encoding for agNA (e.g., gRNA) comprising a first segment comprising a regulatoryregion; a second segment encoding a targeting sequence; and a thirdsegment comprising a nucleic acid encoding a nucleic acid-guidednuclease (e.g., CRISPR/Cas) system protein-binding sequence. In someembodiments, the third segment comprises a single transcribed component,which upon transcription yields a NA (e.g., RNA) stem-loop sequence. Insome embodiments, the third segment comprising a single transcribedcomponent that encodes for the gNA (e.g., gRNA) stem-loop sequence isdouble-stranded, comprises the following DNA sequence on one strand(5′>3′,GTTTTAGAGCTAGAAATAGCAAGTTAAAATAAGGCTAGTCCGTTATCAACTTGAAAAAGTGGCACCGAGTCGGTGCTTTTTTTTT) (SEQ ID NO: 3), and its reverse-complementary DNA onthe other strand (5′>3′,AAAAAAAGCACCGACTCGGTGCCACTTTTTCAAGTTGATAACGGACTAGCCTTATTTTAACTTGCTATTTCTAGCTCTAAAAC) (SEQ ID NO: 4). In some embodiments, the third segmentcomprising a single transcribed component that encodes for the gNA(e.g., gRNA) stem-loop sequence is single-stranded, and comprises thefollowing DNA sequence: (5′>3′,AAAAAAAGCACCGACTCGGTGCCACTTTTTCAAGTTGATAACGGACTAGCCTTATITTAACTTGCTATTTCTAGCTCTAAAAC) (SEQ ID NO: 4), wherein the single-stranded DNA servesas a transcription template. In some embodiments, upon transcriptionfrom the single transcribed component, the resulting gNA (e.g., gRNA)stem-loop sequence comprises the following RNA sequence: (5′>3′,GUUUUAGAGCUAGAAAUAGCAAGUUAAAAUAAGGCUAGUCCGUUAUCAACUUGAAAAAGUGGCACCGAGUCGGUGCUUUUUUU) (SEQ ID NO: 1). In some embodiments, the thirdsegment comprising a single transcribed component that encodes for thegNA (e.g., gRNA) stem-loop sequence is double-stranded, comprises thefollowing DNA sequence on one strand (5′>3′,GTTTTAGAGCTATGCTGGAAACAGCATAGCAAGTTAAAATAAGGCTAGTCCGTTATCAACTTGAAAAAGTGGCACCGAGTCGGTGCTTTTTTTC) (SEQ ID NO: 5), and itsreverse-complementary DNA on the other strand (5′>3′,GAAAAAAAGCACCGACTCGGTGCCACTTTTTCAAGTTGATAACGGACTAGCCTTATTTTAACTTGCTATGCTGTTTCCAGCATAGCTCTAAAAC) (SEQ ID NO: 6). In some embodiments, thethird segment comprising a single transcribed component that encodes forthe gNA (e.g., gRNA) stem-loop sequence is single-stranded, andcomprises the following DNA sequence: (5′>3′,GAAAAAAAGCACCGACTCGGTGCCACUITITrCAAGTTGATAACGGACTAGCCTTATTTTAACTTGCTATGCTGTTTCCAGCATAGCTCTAAAAC) (SEQ ID NO: 6), wherein the single-strandedDNA serves as a transcription template. In some embodiments, upontranscription from the single transcribed component, the yielded gRNAstem-loop sequence comprises the following RNA sequence: (5′>3′,GUUUUAGAGCUAUGCUGGAAACAGCAUAGCAAGUUAAAAUAAGGCUAGUCCGUUAUCAACUUGAAAAAGUGGCACCGAGUCGGUGCUUUUUUUC) (SEQ ID NO: 2). In some embodiments, thethird segment comprises two sub-segments, which encode for a crRNA and atracrRNA upon transcription. In some embodiment, the crRNA does notcomprise the N20 plus the extra sequence which can hybridize withtracrRNA. In some embodiments, the crRNA comprises the extra sequencewhich can hybridize with tracrRNA. In some embodiments, the twosub-segments are independently transcribed. In some embodiments, the twosub-segments are transcribed as a single unit. In some embodiments, theDNA encoding the crRNA comprises N_(target)GTTTTAGAGCTATGCTGTTTTG (SEQID NO: 7), where N_(target) represents the targeting sequence. In someembodiments, the DNA encoding the tracrRNA comprises the sequenceGGAACCATTCAAAACAGCATAGCAAGTTAAAATAAGGCTAGTCCGTTATCAACTTGAAAAAGTGGCACCGAGTCGGTGCTTTTTTT (SEQ ID NO: 8).

In some embodiments, provided herein is a nucleic acid encoding for agNA (e.g., gRNA) comprising a first segment comprising a regulatoryregion; a second segment encoding a targeting sequence; and a thirdsegment comprising a nucleic acid encoding a nucleic acid-guidednuclease (e.g., CRISPR/Cas) system protein-binding sequence. In someembodiments, the third segment comprises a DNA sequence, which upontranscription yields a gRNA stem-loop sequence capable of binding anucleic acid-guided nuclease (e.g., CRISPR/Cas) system protein. In oneembodiment, the DNA sequence can be double-stranded. In someembodiments, the third segment double stranded DNA comprises thefollowing DNA sequence on one strand (5′>3′,GTITAGAGCTAGAAATAGCAAGTTAAAATAAGGCTAGTCCGTTATCAACTTGAAAAAGTGGCACCGAGTCGGTGCTTTTTTT) (SEQ ID NO: 3), and its reverse-complementary DNA onthe other strand (5′>3′,AAAAAAAGCACCGACTCGGTGCCACTTTTTCAAGTTGATAACGGACTAGCCTTATTTTAACTTGCTATTTCTAGCTCTAAAAC) (SEQ ID NO: 4). In some embodiments, the third segmentdouble stranded DNA comprises the following DNA sequence on one strand(5′>3′,GTTITAGAGCTATGCTGGAAACAGCATAGCAAGTTAAAATAAGGCTAGTCCGTTATCAACTTGAAAAAGTGGCACCGAGTCGGTGCTTTTTTTC) (SEQ ID NO: 5), and itsreverse-complementary DNA on the other strand (5′>3′,GAAAAAAAGCACCGACTCGGTGCCACTrFITCAAGTTGATAACGGACTAGCCTTATTTAACTTGCTATGCTGTTTCCAGCATAGCTCTAAAAC) (SEQ ID NO: 6). In one embodiment, the DNAsequence can be single-stranded. In some embodiments, the third segmentsingle stranded DNA comprises the following DNA sequence (5′>3′,AAAAAAAGCACCGACTCGGTGCCACTITrrICAAGTTGATAACGGACTAGCCTTATTTTAACTTGCTATTTCTAGCTCTAAAAC) (SEQ ID NO: 4), wherein the single-stranded DNA servesas a transcription template. In some embodiments, the third segmentsingle stranded DNA comprises the following DNA sequence (5′>3′,GAAAAAAAGCACCGACTCGGTGCCACTTTTTCAAGTTGATAACGGACTAGCCTTATTTAACTTGCTATGCTGTTTCCAGCATAGCTCTAAAAC) (SEQ ID NO: 6), wherein the single-strandedDNA serves as a transcription template. In some embodiments, the thirdsegment comprises a DNA sequence which, upon transcription, yields afirst RNA sequence that is capable of forming a hybrid with a second RNAsequence, and which hybrid is capable of CRISPR/Cas system proteinbinding. In some embodiments, the third segment is double-stranded DNAcomprising the DNA sequence on one strand: (5′>3′,GTTTTAGAGCTATGCTGTTTTG) (SEQ ID NO: 9) and its reverse complementary DNAsequence on the other strand: (5′>3′, CAAAACAGCATAGCTCTAAAAC) (SEQ IDNO: 10). In some embodiments, the third segment is single-stranded DNAcomprising the DNA sequence of (5′>3′, CAAAACAGCATAGCTCTAAAAC) (SEQ IDNO: 10). In some embodiments, the second segment and the third segmenttogether encode for a crRNA sequence. In some embodiments, the secondRNA sequence that is capable of forming a hybrid with the first RNAsequence encoded by the third segment of the nucleic acid encoding agRNA is a tracrRNA. In some embodiments, the tracrRNA comprises thesequence (5′>3′,GGAACCAUUCAAAACAGCAUAGCAAGUUAAAAUAAGGCUAGUCCGUUAUCAACUUGAAAAAGUGGCACCGAGUCGGUGCUUUUUUU) (SEQ ID NO: 11). In some embodiments, thetracrRNA is encoded by a double-stranded DNA comprising sequence of(5′>3′,GGAACCATTCAAAACAGCATAGCAAGTTAAAATAAGGCTAGTCCGTTATCAACTTGAAAAAGTGGCACCGAGTCGGTGCTTTTTTT) (SEQ ID NO: 8), and optionally fused with aregulatory sequence at its 5′ end. In some embodiments, the regulatorysequence can be bound by a transcription factor. In some embodiments,the regulatory sequence is a promoter. In some embodiments, theregulatory sequence is a T7 promoter, comprising the sequence of (5′>3′,GCCTCGAGCTAATACGACTCACTATAGAG) (SEQ ID NO: 12).

In some embodiments, provided herein is a nucleic acid encoding for agNA comprising a first segment comprising a regulatory region; a secondsegment encoding a targeting sequence; and a third segment comprising anucleic acid encoding a nucleic acid-guided nuclease (e.g., CRISPR/Cas)system protein-binding sequence. In some embodiments, the third segmentencodes for a RNA sequence that, upon post-transcriptional cleavage,yields a first RNA segment and a second RNA segment. In someembodiments, the first RNA segment comprises a crRNA and the second RNAsegment comprises a tracrRNA, which can form a hybrid and together,provide for nucleic acid-guided nuclease (e.g., CRISPR/Cas) systemprotein binding. In some embodiments, the third segment furthercomprises a spacer in between the transcriptional unit for the first RNAsegment and the second RNA segment, which spacer comprises an enzymecleavage site.

In some embodiments, provided herein is a gNA (e.g., gRNA) comprising afirst NA segment comprising a targeting sequence and a second NA segmentcomprising a nucleic acid-guided nuclease (e.g., CRISPR/Cas) systemprotein-binding sequence. In some embodiments, the size of the firstsegment is greater than 30 bp. In some embodiments, the second segmentcomprises a single segment, which comprises the gRNA stem-loop sequence.In some embodiments, the gRNA stem-loop sequence comprises the followingRNA sequence: (5′>3′,GUUUUAGAGCUAGAAAUAGCAAGUUAAAAUAAGGCUAGUCCGUUAUCAACUUGAAAAAGUGGCACCGAGUCGGUGCUUUUUUU) (SEQ ID NO: 1). In some embodiments, the gRNAstem-loop sequence comprises the following RNA sequence: (5′>3′,GUUUUAGAGCUAUGCUGGAAACAGCAUAGCAAGUUAAAAUAAGGCUAGUCCGUUAUCAACUUGAAAAAGUGGCACCGAGUCGGUGCUUUUUUUC) (SEQ ID NO: 2). In some embodiments, thesecond segment comprises two sub-segments: a first RNA sub-segment(crRNA) that forms a hybrid with a second RNA sub-segment (tracrRNA),which together act to direct nucleic acid-guided nuclease (e.g.,CRISPR/Cas) system protein binding. In some embodiments, the sequence ofthe second sub-segment comprises GUUUUAGAGCUAUGCUGUUUUG. In someembodiments, the first RNA segment and the second RNA segment togetherforms a crRNA sequence. In some embodiments, the other RNA that willform a hybrid with the second RNA segment is a tracrRNA. In someembodiments the tracrRNA comprises the sequence of 5′>3′,GGAACCAUUCAAAACAGCAUAGCAAGUUAAAAUAAGGCUAGUCCGUUAUCAACUUGAAAAAGUGGCACCGAGUCGGUGCUUUUUUU (SEQ ID NO: 11).

CRISPR/Cas System Nucleic Acid-Guided Nucleases

In some embodiments, CRISPR/Cas system proteins are used in theembodiments provided herein. In some embodiments, CRISPR/Cas systemproteins include proteins from CRISPR Type I systems, CRISPR Type IIsystems, and CRISPR Type III systems.

In some embodiments, CRISPR/Cas system proteins can be from anybacterial or archaeal species.

In some embodiments, the CRISPR/Cas system protein is isolated,recombinantly produced, or synthetic.

In some embodiments, the CRISPR/Cas system proteins are from, or arederived from CRISPR/Cas system proteins from Streptococcus pyogenes,Staphylococcus aureus, Neisseria meningitidis, Streptococcusthermophiles, Treponema denticola, Francisella tularensis, Pasteurellamultocida, Campylobacter jejuni, Campylobacter lari, Mycoplasmagallisepticum, Nitratifractor salsuginis, Parvibaculum lavamentivorans,Roseburia intestinalis, Neisseria cinerea, Gluconacetobacterdiazotrophicus, Azospirillum, Sphaerochaeta globus, Flavobacteriumcolumnare, Fluviicola taffensis, Bacteroides coprophilus, Mycoplasmamobile, Lactobacillus farciminis, Streptococcus pasteurianus,Lactobacillus johnsonii, Staphylococcus pseudintermedius, Filifactoralocis, Legionella pneumophila, Suterella wadsworthensis, orCorynebacter diphtheria.

In some embodiments, examples of CRISPR/Cas system proteins can benaturally occurring or engineered versions.

In some embodiments, naturally occurring CRISPR/Cas system proteins canbelong to CAS Class I Type I, III, or IV, or CAS Class II Type II or V,and can include Cas9, Cas3, Cas8a-c, Cas10, Cse1, Csy1, Csn2, Cas4,Csm2, Cmr5, Csf1, C2c2, and Cpf1.

In an exemplary embodiment, the CRISPR/Cas system protein comprisesCas9.

A “CRISPR/Cas system protein-gNA complex” refers to a complex comprisinga CRISPR/Cas system protein and a guide NA (e.g. a gRNA or a gDNA).Where the gNA is a gRNA, the gRNA may be composed of two molecules,i.e., one RNA (“crRNA”) which hybridizes to a target and providessequence specificity, and one RNA, the “tracrRNA”, which is capable ofhybridizing to the crRNA. Alternatively, the guide RNA may be a singlemolecule (i.e., a gRNA) that contains crRNA and tracrRNA sequences.

A CRISPR/Cas system protein may be at least 60% identical (e.g., atleast 70%, at least 80%, or 90% identical, at least 95% identical or atleast 98% identical or at least 99% identical) to a wild type CRISPR/Cassystem protein. The CRISPR/Cas system protein may have all the functionsof a wild type CRISPR/Cas system protein, or only one or some of thefunctions, including binding activity, nuclease activity, and nucleaseactivity.

The term “CRISPR/Cas system protein-associated guide NA” refers to aguide NA. The CRISPR/Cas system protein-associated guide NA may exist asisolated NA, or as part of a CRISPR/Cas system protein-gNA complex.

Cas9

In some embodiments, the CRISPR/Cas System protein nucleic acid-guidednuclease is or comprises Cas9. The Cas9 of the present invention can beisolated, recombinantly produced, or synthetic.

Examples of Cas9 proteins that can be used in the embodiments herin canbe found in F. A. Ran, L. Cong, W. X. Yan, D. A. Scott, J. S.Gootenberg, A. J. Kriz, B. Zetsche, O. Shalem, X. Wu, K. S. Makarova, E.V. Koonin, P. A. Sharp, and F. Zhang; “In vivo genome editing usingStaphylococcus aureus Cas9” Nature 520, 186-191 (9 Apr. 2015)doi:10.1038/nature14299, which is incorporated herein by reference.

In some embodiments, the Cas9 is a Type II CRISPR system derived fromStreptococcus pyogenes, Staphylococcus aureus, Neisseria meningitidis,Streptococcus thermophiles, Treponema denticola, Francisella tularensis,Pasteurella multocida, Campylobacter jejuni, Campylobacter lar,Mycoplasma gallisepticum, Nitratifractor salsuginis, Parvibaculumlavamentivorans, Roseburia intestinalis, Neisseria cinerea,Gluconacetobacter diazotrophicus, Azospirillum, Sphaerochaeta globus,Flavobacterium columnare, Fluviicola taffensis, Bacteroides coprophilus,Mycoplasma mobile, Lactobacillus farciminis, Streptococcus pasteurianus,Lactobacillus johnsonii, Staphylococcus pseudintermedius, Filifactoralocis, Legionella pneumophila, Suterella wadsworthensis, orCorynebacter diphtheria.

In some embodiments, the Cas9 is a Type II CRISPR system derived from S.pyogenes and the PAM sequence is NGG located on the immediate 3′ end ofthe target specific guide sequence. The PAM sequences of Type II CRISPRsystems from exemplary bacterial species can also include: Streptococcuspyogenes (NGG), Staph aureus (NNGRRT), Neisseria meningitidis (NNNNGATT), Streptococcus thermophilus (NNAGAA) and Treponema denticola(NAAAAC) which are all usable without deviating from the presentinvention.

In one exemplary embodiment, Cas9 sequence can be obtained, for example,from the pX330 plasmid (available from Addgene), re-amplified by PCRthen cloned into pET30 (from EMD biosciences) to express in bacteria andpurify the recombinant 6His tagged protein.

A “Cas9-gNA complex” refers to a complex comprising a Cas9 protein and aguide NA. A Cas9 protein may be at least 60% identical (e.g., at least70%, at least 80%, or 90% identical, at least 95% identical or at least98% identical or at least 99% identical) to a wild type Cas9 protein,e.g., to the Streptococcus pyogenes Cas9 protein. The Cas9 protein mayhave all the functions of a wild type Cas9 protein, or only one or someof the functions, including binding activity, nuclease activity, andnuclease activity.

The term “Cas9-associated guide NA” refers to a guide NA as describedabove. The Cas9-associated guide NA may exist isolated, or as part of aCas9-gNA complex.

Non-CRISPR/Cas System Nucleic Acid-Guided Nucleases

In some embodiments, non-CRISPR/Cas system proteins are used in theembodiments provided herein.

In some embodiments, the non-CRISPR/Cas system proteins can be from anybacterial or archaeal species.

In some embodiments, the non-CRISPR/Cas system protein is isolated,recombinantly produced, or synthetic.

In some embodiments, the non-CRISPR/Cas system proteins are from, or arederived from Aquifex aeolicus, Thermus thermophilus, Streptococcuspyogenes, Staphylococcus aureus, Neisseria meningitidis, Streptococcusthermophiles, Treponema denticola, Francisella tularensis, Pasteurellamultocida, Campylobacter jejuni, Campylobacter lari, Mycoplasmagallisepticum, Nitratifractor salsuginis, Parvibaculum lavamentivorans,Roseburia intestinalis, Neisseria cinerea, Gluconacetobacterdiazotrophicus, Azospirillum, Sphaerochaeta globus, Flavobacteriumcolumnare, Fluviicola taffensis, Bacteroides coprophilus, Mycoplasmamobile, Lactobacillus farciminis, Streptococcus pasteurianus,Lactobacillus johnsonii, Staphylococcus pseudintermedius, Filifactoralocis, Legionella pneumophila, Suterella wadsworthensis,Natronobacterium gregoryi, or Corynebacter diphtheria.

In some embodiments, the non-CRISPR/Cas system proteins can be naturallyoccurring or engineered versions.

In some embodiments, a naturally occurring non-CRISPR/Cas system proteinis NgAgo (Argonaute from Natronobacterium gregoryi).

A “non-CRISPR/Cas system protein-gNA complex” refers to a complexcomprising a non-CRISPR/Cas system protein and a guide NA (e.g. a gRNAor a gDNA). Where the gNA is a gRNA, the gRNA may be composed of twomolecules, i.e., one RNA (“crRNA”) which hybridizes to a target andprovides sequence specificity, and one RNA, the “tracrRNA”, which iscapable of hybridizing to the crRNA. Alternatively, the guide RNA may bea single molecule (i.e., a gRNA) that contains crRNA and tracrRNAsequences.

A non-CRISPR/Cas system protein may be at least 60% identical (e.g., atleast 70%, at least 80%, or 90% identical, at least 95% identical or atleast 98% identical or at least 99% identical) to a wild typenon-CRISPR/Cas system protein. The non-CRISPR/Cas system protein mayhave all the functions of a wild type non-CRISPR/Cas system protein, oronly one or some of the functions, including binding activity, nucleaseactivity, and nuclease activity.

The term “non-CRISPR/Cas system protein-associated guide NA” refers to aguide NA. The non-CRISPR/Cas system protein-associated guide NA mayexist as isolated NA, or as part of a non-CRISPR/Cas system protein-gNAcomplex.

Catalytically Dead Nucleic Acid-Guided Nucleases

In some embodiments, engineered examples of nucleic acid-guidednucleases include catalytically dead nucleic acid-guided nucleases(CRISPR/Cas system nucleic acid-guided nucleases or non-CRISPR/Cassystem nucleic acid-guided nucleases). The term “catalytically dead”generally refers to a nucleic acid-guided nuclease that has inactivatednucleases, for example inactivated HNH and RuvC nucleases. Such aprotein can bind to a target site in any nucleic acid (where the targetsite is determined by the guide NA), but the protein is unable to cleaveor nick the nucleic acid.

Accordingly, the catalytically dead nucleic acid-guided nuclease allowsseparation of the mixture into unbound nucleic acids and catalyticallydead nucleic acid-guided nuclease-bound fragments. In one exemplaryembodiment, a dCas9/gRNA complex binds to the targets determined by thegRNA sequence. The dCas9 bound can prevent cutting by Cas9 while othermanipulations proceed.

In another embodiment, the catalytically dead nucleic acid-guidednuclease can be fused to another enzyme, such as a transposase, totarget that enzyme's activity to a specific site.

In some embodiments, the catalytically dead nucleic acid-guided nucleaseis dCas9, dCpf1, dCas3, dCas8a-c, dCas10, dCse1, dCsy1, dCsn2, dCas4,dCsm2, dCm5, dCsf1, dC2C2, or dNgAgo.

In one exemplary embodiment the catalytically dead nucleic acid-guidednuclease protein is a dCas9.

Nucleic Acid-Guided Nuclease Nickases

In some embodiments, engineered examples of nucleic acid-guidednucleases include nucleic acid-guided nuclease nickases (referred tointerchangeably as nickase nucleic acid-guided nucleases).

In some embodiments, engineered examples of nucleic acid-guidednucleases include CRISPR/Cas system nickases or non-CRISPR/Cas systemnickases, containing a single inactive catalytic domain.

In some embodiments, the nucleic acid-guided nuclease nickase is a Cas9nickase, Cpf1 nickase, Cas3 nickase, Cas8a-c nickase, Cas10 nickase,Cse1 nickase, Csy1 nickase, Csn2 nickase, Cas4 nickase, Csm2 nickase,Cm5 nickase, Csf1 nickase, C2C2 nickase, or a NgAgo nickase.

In one embodiment, the nucleic acid-guided nuclease nickase is a Cas9nickase.

In some embodiments, a nucleic acid-guided nuclease nickase can be usedto bind to target sequence. With only one active nuclease domain, thenucleic acid-guided nuclease nickase cuts only one strand of a targetDNA, creating a single-strand break or “nick”. Depending on which mutantis used, the guide NA-hybridized strand or the non-hybridized strand maybe cleaved. nucleic acid-guided nuclease nickases bound to 2 gNAs thattarget opposite strands can create a double-strand break in the nucleicacid. This “dual nickase” strategy increases the specificity of cuttingbecause it requires that both nucleic acid-guided nuclease/gNA complexesbe specifically bound at a site before a double-strand break is formed.

In exemplary embodiments, a Cas9 nickase can be used to bind to targetsequence. The term “Cas9 nickase” refers to a modified version of theCas9 protein, containing a single inactive catalytic domain, i.e.,either the RuvC- or the HNH-domain. With only one active nucleasedomain, the Cas9 nickase cuts only one strand of the target DNA,creating a single-strand break or “nick”. Depending on which mutant isused, the guide RNA-hybridized strand or the non-hybridized strand maybe cleaved. Cas9 nickases bound to 2 gRNAs that target opposite strandswill create a double-strand break in the DNA. This “dual nickase”strategy can increase the specificity of cutting because it requiresthat both Cas9/gRNA complexes be specifically bound at a site before adouble-strand break is formed.

Capture of DNA can be carried out using a nucleic acid-guided nucleasenickase. In one exemplary embodiment, a nucleic acid-guided nucleasenickase cuts a single strand of double stranded nucleic acid, whereinthe double stranded region comprises methylated nucleotides.

Dissociable and Thermostable Nucleic Acid-Guided Nucleases

In some embodiments, thermostable nucleic acid-guided nucleases are usedin the methods provided herein (thermostable CRISPR/Cas system nucleicacid-guided nucleases or thermostable non-CRISPR/Cas system nucleicacid-guided nucleases). In such embodiments, the reaction temperature iselevated, inducing dissociation of the protein; the reaction temperatureis lowered, allowing for the generation of additional cleaved targetsequences. In some embodiments, thermostable nucleic acid-guidednucleases maintain at least 50% activity, at least 55% activity, atleast 60% activity, at least 65% activity, at least 70% activity, atleast 75% activity, at least 80% activity, at least 85% activity, atleast 90% activity, at least 95% activity, at least 96% activity, atleast 97% activity, at least 98% activity, at least 99% activity, or100% activity, when maintained for at least 75° C. for at least 1minute. In some embodiments, thermostable nucleic acid-guided nucleasesmaintain at least 50% activity, when maintained for at least 1 minute atleast at 75° C., at least at 80° C., at least at 85° C., at least at 90°C., at least at 91° C., at least at 92° C., at least at 93° C., at leastat 94° C., at least at 95° C., 96° C., at least at 97° C., at least at98° C., at least at 99° C., or at least at 100° C. In some embodiments,thermostable nucleic acid-guided nucleases maintain at least 50%activity, when maintained at least at 75° C. for at least 1 minute, 2minutes, 3 minutes, 4 minutes, or 5 minutes. In some embodiments, athermostable nucleic acid-guided nuclease maintains at least 50%activity when the temperature is elevated, lowered to 25° C.-50° C. Insome embodiments, the temperature is lowered to 25° C., to 30° C., to35° C. to 40° C., to 45° C., or to 50° C. In one exemplary embodiment, athermostable enzyme retains at least 90% activity after 1 min at 95° C.

In some embodiments, the thermostable nucleic acid-guided nuclease isthermostable Cas9, thermostable Cpf1, thermostable Cas3, thermostableCas8a-c, thermostable Cas10, thermostable Cse1, thermostable Csy1,thermostable Csn2, thermostable Cas4, thermostable Csm2, thermostableCm5, thermostable Csf1, thermostable C2C2, or thermostable NgAgo.

In some embodiments, the thermostable CRISPR/Cas system protein isthermostable Cas9.

Thermostable nucleic acid-guided nucleases can be isolated, for example,identified by sequence homology in the genome of thermophilic bacteriaStreptococcus thermophilus and Pyrococcus furiosus. Nucleic acid-guidednuclease genes can then be cloned into an expression vector. In oneexemplary embodiment, a thermostable Cas9 protein is isolated.

In another embodiment, a thermostable nucleic acid-guided nuclease canbe obtained by in vitro evolution of a non-thermostable nucleicacid-guided nuclease. The sequence of a nucleic acid-guided nuclease canbe mutagenized to improve its thermostability.

Methods of Making Collections of gNAs

Provided herein are methods that enable the generation of a large numberof diverse gRNAs, collections of gNAs, from any source nucleic acid(e.g., DNA). Methods provided herein can employ enzymatic methodsincluding but not limited to digestion, ligation, extension, overhangfilling, transcription, reverse transcription, amplification.

Generally, the method can comprise providing a nucleic acid (e.g., DNA);employing a first enzyme (or combinations of first enzymes) that cuts ata part of the PAM sequence in the nucleic acid, in a way that a residualnucleotide sequence from the PAM sequence is left; ligating an adapterthat positions a restriction enzyme typeIIS site (an enzyme that cutsoutside yet near its recognition motif) at a distance to eliminate thePAM sequence; employing a second typeIIS enzyme (or combination ofsecond enzymes) to eliminate the PAM sequence together with the adapter;and fusing a sequence that can be recognized by protein members of thenucleic acid-guided nuclease (e.g., CRISPR/Cas) system, for example, agRNA stem-loop sequence. In some embodiments, the first enzymaticreactions cuts part of the PAM sequence in a way that residualnucleotide sequence from the PAM sequence is left, and that thenucleotide sequence immediately 5′ to the PAM sequence can be any purineor pyrimidine, not just those with a cytosine 5′ to the PAM sequence,for example, not just those that are C/NGG or C/TAG, etc.

Table 1 shows exemplary strategies/protocols to convert any sourcenucleic acid (e.g., DNA) into a collection of gNAs (e.g., gRNAs) usingdifferent restriction enzymes.

TABLE 1Exemplary strategies for preparing a collection of guide nucleic acids.First 3′ Adapter sequence with CRISPR/Cas PAM Enzyme/typeIIS enzyme site System Se- Compo- (provided with only one Speciesquence nents Strategy strand sequence 5′→3′) Streptococcus NGG CviPIINicks immediately 5′ of CCD sequence, ggGACTCggatccctatagtc pyogenesnicks the other strand with T7 endonuclease (SEQ ID NO: 4421)(SP); SpCas9 I, blunt with T4 DNA polymerase; ligate toadapter; cut with MlyI to remove PAM andadapter; ligate gRNA stem-loop sequence at 3′ end Staphylococcus NNGRRTAlwI Cut, blunt with T4 DNA polymerase; ligate tottttagcggccgcctgctgCTCtacaa aureus oradapter SA; cut with EcoP15I to remove agacgatgacgacaagcgt (SA); SaCas9NNGRR PAM and adapter; blunt end; ligate gRNA (SEQ ID NO: 4422) (N)stem-loop sequence at 3′ end Neisseria NNNNGA TfiICut, blunt with T4 DNA polymerase; ligate to TCgcggccgcttttattctgctgCTCtmeningitidis TT adapter NM; cut with EcoRI to eliminateacaaagacgatgacgacaagcgt (NM) unwanted DNA and EcoP15I to remove PAM(SEQ ID NO: 4428) and adapter; blunt end; ligate gRNA stem-loop sequence at 3′ end Streptococcus NNAGAA BsmICut, blunt with T4 DNA polymerase; ligate to ttacggccgcttttattctgctgCTCtthermophilus W adapter ST; cut with EcoP15I to remove PAMacaaagacgatgacgacaagcgt (ST) and adapter; blunt end; ligate gRNA stem-(SEQ ID NO: 4429) loop sequence at 3′ end Treponema NAAAAC Cly7489ICut, blunt with T4 DNA polymerase; ligate to tttagcggccgcctgecgCTCtacaaadenticola I adapter TD; cut with EcoP15I to remove gacgatgacgacaagcgt(TD) PAM and adapter (SEQ ID NO: 4430)

Table 2 shows additional exemplary strategies/protocols to convert anysource nucleic acid (e.g., DNA) into a collection of gNAs (e.g., gRNAs)using different restriction enzymes.

TABLE 2Additional exemplary strategies for preparing a collection of guide nucleic acids.CRISPR/ First Adapter oligo sequence (with Cas System PAM Enzyme/Inosine overhangs, all in 5′→3′ Species Sequence ComponentExemplary Strategy direction) Streptococcus NGG CviPIINicks immediately 5′ of CCD Adapter oligo I: pyogenes (SP);sequence, nicks the other strand ggggGACTCggatccctatagtgatac SpCas9with T7 endonuclease I; ligate to aaagacgatgacgacaagcgadapter; cut with MlyI to remove (SEQ ID NO: 4404)PAM and 3′ adapter; ligate gRNA Adapter oligo 2:stem-loop sequence at 3′ end gcctcgagc*t*a*atacgactcactatagggatccaagtccc (* denotes a phosphorothioate backbone linkage)(SEQ ID NO: 4405) Staphylococcus NNGRRT or AlwICut; ligate to adapter SA; cut Adapter oligo 1: aureus (SA), NNGRR(N)with EcoP15I to remove PAM and 3′ IttttagcggccgcctgctgCTCtacaaa SaCas9adapter; blunt end; ligate gRNA gacgatgacgacaagcgtstem-loop sequence at 3′ end (SEQ ID NO: 4422) Adapter oligo 2:gagatcagcttctgcattgatgcGAGcag caggcggccgctaaaa (SEQ ID NO: 4423)Neisseria NNNNGATT TfiI Cut; ligate to adapter NM; cut Adapter oligo 1:meningitidis with EcoP15I to remove PAM andattTCgcggccgcttttattctgctgCTCt (NM) 3′ adapter; blunt end; ligateacaaagacgatgacgacaagcgt gRNA stem-loop sequence at 3′ (SEQ ID NO: 4424)end Adapter oligo 2: gagatcagcttctgcattgatgcGAGcag cagaataaaagcggccgcGA(SEQ ID NO: 4425) Streptococcus NNAGAAW BsmICut; ligate to adapter ST; cut Adapter oligo 1: thermophiluswith EcoP15I to remove PAM and 3′ gcggccgcttttattctgctgCTCtacaaa (ST)adapter; blunt end; ligate gRNA gacgatgacgacaagcgtstem-loop sequence at 3′ end (SEQ ID NO: 4426) Adapter oligo 2:gagatcagcttctgcattgatgcGAGcag cagaataaaagcggccgcIG SEQ ID NO: 4427)

Exemplary applications of the compositions and methods described hereinare provided in FIG. 1, FIG. 2, FIG. 3, FIG. 4, FIG. 5, FIG. 6, and FIG.7. The figures depict non-limiting exemplary embodiments of the presentinvention that includes a method of constructing a gNA library (e.g.,gRNA library) from input nucleic acids (e.g., DNA), such as genomic DNA(e.g., human genomic DNA).

In FIG. 1, the starting material can be fragmented genomic DNA (e.g.,human) or other source DNA. These fragments are blunt-ended beforeconstructing the library 101. T7 promoter adapters are ligated to theblunt-ended DNA fragments 102, which is then PCR amplified. Nt.CviPII isthen used to generate a nick on one strand of the PCR productimmediately 5′ to the CCD sequence 103. T7 Endonuclease I cleaves on theopposite strand 1, 2, or 3 bp 5′ of the nick 104. The resulting DNAfragments are blunt-ended with T4 DNA Polymerase, leaving HGG sequenceat the end of the DNA fragment 105. The resulting DNA is cleaned andrecovered on beads. An adapter carrying MlyI recognition site is ligatedto the blunt-ended DNA fragment immediately 3′ of HGG sequence 106. MlyIgenerates a blunt-end cleavage immediately 5′ to the HGG sequence,removing HGG together with the adapter sequence 107. The resulting DNAfragments are cleaned and recovered again on beads. A gRNA stem-loopsequence is then ligated to the blunt-end cleaved by MlyI, forming agRNA library covering the human genome 108. This library of DNA is thenPCR amplified and cleaned on beads, ready for in vitro transcription.

In FIG. 2, the starting material can intact genomic DNA (e.g., human) orother source DNA 201. Nt.CviPII and T7 Endonuclease I are used togenerate nicks on each strand of the human genomic DNA, resulting insmaller DNA fragments 202. DNA fragments of 200-600 bp are size selectedon beads, then ligated with Y-shaped adapters carrying a GG overhang onthe 5′. One strand of the Y-shaped adapter contains a MlyI recognitionsite, wherein the other strand contains a mutated MlyI site and a T7promoter sequence 203. Because of these features, after PCRamplification, the T7 promoter sequence is at the distal end of the HGGsequence, and the MlyI sequence is at the rear end of HGG 204. Digestionwith MlyI generates a cleavage immediately 5′ of HGG sequence 205. MlyIgenerates a blunt-end cleavage immediately 5′ to the HGG sequence,removing HGG together with the adapter sequence 206. A gRNA stem-loopsequence is then ligated to the blunt-end cleaved by MlyI, forming agRNA library covering the human genome. This library of DNA is then PCRamplified and cleaned on beads, ready for in vitro transcription.

In FIG. 3, the source DNA (e.g., genomic DNA) can be nicked 301, forexample with a nicking enzyme. In some cases, the nicking enzyme canhave a recognition site that is three or fewer bases in length. In somecases, CviPII is used, which can recognize and nick at a sequence of CCD(where D represents a base other than C). Nicks can be proximal,surrounding a region containing the sequence (represented by the thickerline) which will be used to yield the guide RNA N20 sequence. When nicksare proximal, a double stranded break can occur and lead to 5′ or 3′overhangs 302. These overhangs can be repaired, for example with apolymerase (e.g., T4 polymerase). In some cases, such as with 5′strands, repair can comprise synthesizing a complementary strand. Insome case, such as with 3′ strands, repair can comprise removingoverhangs. Repair can result in a blunt end including the N20 guidesequence and a sequence complementary to the nick recognition sequence(e.g., HGG, where H represents a base other than G).

In FIG. 4, continuing for example from the end of FIG. 3, differentcombinations of adapters can be ligated to the DNA to allow for thedesired cleaving. Adapters with a recognition site for a nuclease enzymethat cuts 3 base pairs from the site (e.g., MlyI) can be ligated 401,and digestion at that site can be used to remove a left over sequence,such as an HGG sequence 402. Adapters with a recognition site for anuclease that cuts 20 base pairs from the site (e.g., MmeI) 403. Theseadapters can also include a second recognition site for a nuclease thatcuts the proper number of nucleotides from the site to later remove thefirst recognition site (e.g., BsaXI). The first enzyme can be used tocut 20 nucleotides down, thereby keeping the N20 sequence 404. Then, apromoter adapter (e.g., T7) can be ligated next to the N20 sequence 405.Then, the nuclease corresponding to the second recognition site (e.g.,BsaXI) can be used to remove the adapter for the site that cuts 20nucleotides away (e.g., MmeI) 406. Finally, the guide RNA stem-loopsequence adapter can be ligated to the N20 sequence 407 to prepare forguide RNA production.

Alternatively, the protocol shown in FIG. 5 can follow the end of aprotocol such as that shown in FIG. 3. Adapters with a recognition sitefor a nuclease enzyme that cleaves 25 nucleotides from the site (e.g.,EcoP151) can be ligated to the DNA 501. These adapters can also includea second recognition site for a nuclease that cuts the proper number ofnucleotides from the site to later remove the first recognition site(e.g., Bac) and any other left-over sequence, such as HGG. The enzymecorresponding to the first recognition site (e.g., EcoP15I) can then beused to cleave after the N20 sequence 502. Then, a promoter adapter(e.g., T7) can be ligated next to the N20 sequence 503. The enzymecorresponding to the second recognition site (e.g., BaeI) can then beused to remove the recognition sites and any residual sequence (e.g.,HGG) 504. Finally, the guide RNA stem-loop sequence adapter can beligated (e.g., by single strand ligation) to the N20 sequence 505.

As an alternative to protocols such as that shown in FIG. 3, theprotocol shown in FIG. 6 can be used in preparation for protocols suchas those shown in FIG. 4 or FIG. 5. A nick can be introduced by anicking enzyme (e.g., CviPII) 601. In some cases, the nick recognitionsite is three or fewer bases in length. In some cases, CviPII is used,which can recognize and nick at a sequence of CCD. A polymerase (e.g.,Bst large fragment DNA polymerase) can then be used to synthesize a newDNA strand starting from the nick while displacing the old strand 602.Because of the DNA synthesis, the nick can be sealed and made availableto be nicked again 603. Subsequent cycles of nicking and synthesis canbe used to yield large amounts of target sequences 604. These singlestranded copies of target sequences can be made double stranded, forexample by random priming and extension. These double stranded nucleicacids comprising N20 sequences can then be further processed by methodsdisclosed herein, such as those shown in FIG. 4 or FIG. 5.

As another alternative to protocols such as that shown in FIG. 3 or FIG.6, the protocol shown in FIG. 7 can be used in preparation for protocolssuch as those shown in FIG. 4 or FIG. 5. A nick can be introduced by anicking enzyme (e.g., CviPII) 701. In some cases, the nicking enzymerecognition site is three or fewer bases in length. In some cases,CviPII is used, which can recognize and nick at a sequence of CCD. Apolymeras (e.g., Bst large fragment DNA polymerase) can then be used tosynthesize a new DNA strand starting from the nick while displacing theold strand (e.g., nicking endonuclease-mediated strand-displacement DNAamplification (NEMDA)). The reaction parameters can be adjusted tocontrol the size of the single stranded DNA produced. For example, thenickase:polymerase ratio (e.g., CviPII:Bts large fragment polymeraseratio) can be adjusted. Reaction temperature can also be adjusted. Next,an oligonucleotide can be added 704 which has (in the 5′>3′ direction) apromoter (e.g., T7 promoter) 702 followed by a random n-mer (e.g.,random 6-mer, random 8-mer) 703. The random n-mer region can bind to aregion of the single stranded DNA generated previously. For example,binding can be conducted by denaturing at high temperature followed byrapid cool down, which can allow the random n-mer region to bind to thesingle stranded DNA generated by NEMDA. In some cases, the DNA isdenatured at 98° C. for 7 minutes then cooled down rapidly to 10° C.Extension and/or amplification can be used to produce double-strandedDNA. Blunt ends can be produced, for example enzymatically (e.g., bytreatment with DNA polymerase I at 20° C.). This can result in one endending at the promoter (e.g., T7 promoter) and the other end ending atany nicking enzyme recognition sites (e.g., any CCD sites). Thesefragments can then be purified, for example by size selection (e.g., bygel purification, capillary electrophoresis, or other fragmentseparation techniques). In some cases, the target fragments are about 50base pairs in length (adapter sequence (e.g., T7 adapter)+target N20sequence+nicking enzyme recognition site or complement (e.g., HGG)).Fragments can then be ligated to an adapter comprising a nucleaserecognition site for a nuclease that cuts an appropriate distance awayto remove the nicking enzyme recognition site 705. For example, for athree-nucleotide long nicking enzyme recognition site (e.g., CCD forCviPII). BaeI can be used. The appropriate nuclease (e.g., BaeI) canthen be used to remove the nuclease recognition site and the nickingenzyme recognition site 706. The remaining nucleic acid sequence (e.g.,the N20 site) can then be ligated to the final stem-loop sequence forthe guide RNA 707. Amplification (e.g., PCR) can be conducted. GuideRNAs can be produced.

In some embodiments, a collection of gNAs (e.g., gRNAs) targeting humanmitochondrial DNA (mtDNA) is created, that can be used for directingnucleic acid-guided nuclease (e.g., Cas9) proteins, comprising thenucleic acid-guided nuclease (e.g., Cas9) target sequence. In someembodiments, the targeting sequence of this collection of gNAs (e.g.,gRNAs) are encoded by DNA sequences comprising at least the 20 ntsequence provided in the second column from the right of Table 3 (if theNGG sequence is on positive strand) and Table 4 (if the NGG sequence ison negative strand). In some embodiments, a collection of gRNA nucleicacids, as provided herein, with specificity for human mitochondrial DNA,comprise a plurality of members, wherein the members comprise aplurality of targeting sequences provided in the second column from theright column of Table 3 and/or the second column from the right of Table4.

TABLE 3gRNA target sequence for human mtDNA carrying NGG sequence on the (+) strand.Chr end nt sequence on the (+) 20 nt gRNA target Chr start positionstrand containing gRNA sequence position (+ target sequence followed SEQ(will encode the gRNA SEQ (+ strand) strand) by NGG ID NO:targeting sequence) ID NO: 13 35 ATCACCCTATTAACCAC 13 ATCACCCTATTAACCA436 TCACGG CTCA 14 36 TCACCCTATTAACCACT 14 TCACCCTATTAACCAC 437 CACGGGTCAC 32 54 ACGGGAGCTCTCCATGC 15 ACGGGAGCTCTCCATG 438 ATTTGG CATT 45 67ATGCATTTGGTATTTTC 16 ATGCATTTGGTATTTT 439 GTCTGG CGTC 46 68TGCATTTGGTATTTTCGT 17 TGCATTTGGTATTTTC 440 CTGGG GTCT 47 69GCATTTGGTATTTTCGT 18 GCATTTGGTATTTTCG 441 CTGGGG TCTG 48 70CATTTGGTATTTTCGTCT 19 CATTTGGTATTTTCGTC 442 GGGGG TGG 49 71ATTTGGTATTTTCGTCTG 20 ATTTGGTATTTTCGTCT 443 GGGGG GGG 79 101GCGATAGCATTGCGAGA 21 GCGATAGCATTGCGAG 444 CGCTGG ACGC 85 107GCATTGCGAGACGCTGG 22 GCATTGCGAGACGCTG 445 AGCCGG GAGC 163 185GCACCTACGTTCAATAT 23 GCACCTACGTTCAATA 446 TACAGG TTAC 207 229GTTAATTAATTAATGCT 24 GTTAATTAATTAATGC 447 TGTAGG TTGT 301 323AACCCCCCCTCCCCCGC 25 AACCCCCCCTCCCCCG 448 TTCTGG CTTC 388 410AGATTTCAAATTTTATC 26 AGATTTCAAATTTTAT 449 TTTTGG CTTT 391 413TTTCAAATTTTATCTTTT 27 TTTCAAATTTTATCTTT 450 GGCGG TGG 604 626ATACACTGAAAATGTTT 28 ATACACTGAAAATGTT 451 AGACGG TAGA 605 627TACACTGAAAATGTTTA 29 TACACTGAAAATGTTT 452 GACGGG AGAC 631 653ACATCACCCCATAAACA 30 ACATCACCCCATAAAC 453 AATAGG AAAT 636 658ACCCCATAAACAAATAG 31 ACCCCATAAACAAATA 454 GTTTGG GGTT 727 749TCTAAATCACCACGATC 32 TCTAAATCACCACGAT 455 AAAAGG CAAA 788 810TTAGCCTAGCCACACCC 33 TTAGCCTAGCCACACC 456 CCACGG CCCA 789 811TAGCCTAGCCACACCCC 34 TAGCCTAGCCACACCC 457 CACGGG CCAC 851 873AACTAAGCTATACTAAC 35 AACTAAGCTATACTAA 458 CCCAGG CCCC 852 874ACTAAGCTATACTAACC 36 ACTAAGCTATACTAAC 459 CCAGGG CCCA 856 878AGCTATACTAACCCCAG 37 AGCTATACTAACCCCA 460 GGTTGG GGGT 880 902CAATTTCGTGCCAGCCA 38 CAATTTCGTGCCAGCC 461 CCGCGG ACCG 912 934TAACCCAAGTCAATAGA 39 TAACCCAAGTCAATAG 462 AGCCGG AAGC 1009 1031CACAAAATAGACTACG 40 ACACAAATAGACTACG 463 AAAGTGG AAAG 1051 1073ACAATAGCTAAGACCCA 41 ACAATAGCTAAGACCC 464 AACTGG AAAC 1052 1074CAATAGCTAAGACCCAA 42 CAATAGCTAAGACCCA 465 ACTGGG AACT 1148 1170AGCCACAGCTTAAAACT 43 AGCCACAGCTTAAAAC 466 CAAAGG TCAA 1154 1176AGCTTAAAACTCAAAGG 44 AGCTTAAAACTCAAAG 467 ACCTGG GACC 1157 1179TTAAAACTCAAAGGACC 45 TTAAAACTCAAAGGAC 468 TGGCGG CTGG 1178 1200GGTGCTTCATATCCCTC 46 GGTGCTTCATATCCCT 469 TAGAGG CTAG 1267 1289TCTTCAGCAAACCCTGA 47 TCTTCAGCAAACCCTG 470 TGAAGG ATGA 1306 1328AGTACCCACGTAAAGAC 48 AGTACCCACGTAAAGA 471 GTTAGG CGTT 1312 1334CACGTAAAGACGTTAGG 49 CACGTAAAGACGTTAG 472 TCAAGG GTCA 1326 1348AGGTCAAGGTGTAGCCC 50 AGGTCAAGGTGTAGCC 473 ATGAGG CATG 1329 1351TCAAGGTGTAGCCCATG 51 TCAAGGTGTAGCCCAT 474 AGGTGG GAGG 1339 1361GCCCATGAGGTGGCAA 52 GCCCATGAGGTGGCAA 475 GAAATGG GAAA 1340 1362CCCATGAGGTGGCAAG 53 CCCATGAGGTGGCAAG 476 AAATGGG AAAT 1389 1411GATAGCCCTTATGAAAC 54 GATAGCCCTTATGAAA 477 TTAAGG CTTA 1390 1412ATAGCCCTTATGAAACT 55 ATAGCCCTTATGAAAC 478 TAAGGG TTAA 1397 1419TTATGAAACTTAAGGGT 56 TTATGAAACTTAAGGG 479 CGAAGG TCGA 1400 1422TGAAACTTAAGGGTCGA 57 TGAAACTTAAGGGTCG 480 AGGTGG AAGG 1441 1463AGTAGAGTGCTTAGTTG 58 AGTAGAGTGCTTAGTT 481 AACAGG GAAC 1442 1464GTAGAGTGCTTAGTTGA 59 GTAGAGTGCTTAGTTG 482 ACAGGG AACA 1494 1516CCTCCTCAAGTATACTT 60 CCTCCTCAAGTATACT 483 CAAAGG TCAA 1530 1552ACCCCTACGCATTTATA 61 ACCCCTACGCATTTAT 484 TAGAGG ATAG 1548 1570AGAGGAGACAAGTCGT 62 AGAGGAGACAAGTCG 485 AACATGG TAACA 1560 1582TCGTAACATGGTAAGTG 63 TCGTAACATGGTAAGT 486 TACTGG GTAC 1573 1595AGTGTACTGGAAAGTGC 64 AGTGTACTGGAAAGTG 487 ACTTGG CACT 1620 1642AAAGCACCCAACTTACA 65 AAAGCACCCAACTTAC 488 CTTAGG ACTT 1726 1748CATTTACCCAAATAAAG 66 CATTTACCCAAATAAA 489 TATAGG GTAT 1746 1768AGGCGATAGAAATTGA 67 AGGCGATAGAAATTG 490 AACCTGG AAACC 1770 1792GCAATAGATATAGTACC 68 GCAATAGATATAGTAC 491 GCAAGG CGCA 1771 1793CAATAGATATAGTACCG 69 CAATAGATATAGTACC 492 CAAGGG GCAA 1809 1831TAACCAAGCATAATATA 70 TAACCAAGCATAATAT 493 GCAAGG AGCA 1862 1884TAACTAGAAATAACTTT 71 TAACTAGAAATAACTT 494 GCAAGG TGCA 1947 1969CCGTCTATGTAGCAAAA 72 CCGTCTATGTAGCAAA 495 TAGTGG ATAG 1948 1970CGTCTATGTAGCAAAAT 73 CGTCTATGTAGCAAAA 496 AGTGGG TAGT 1960 1982AAAATAGTGGGAAGAT 74 AAAATAGTGGGAAGA 497 TTATAGG TTTAT 1966 1988GTGGGAAGATTTATAGG 75 GTGGGAAGATTTATAG 498 TAGAGG GTAG 1987 2009GGCGACAAACCTACCG 76 GGCGACAAACCTACCG 499 AGCCTGG AGCC 1997 2019CTACCGAGCCTGGTGAT 77 CTACCGAGCCTGGTGA 500 AGCTGG TAGC 2086 2108ATTTAACTGTTAGTCCA 78 ATTTAACTGTTAGTCC 501 AAGAGG AAAG 2099 2121TCCAAAGAGGAACAGC 79 TCCAAAGAGGAACAG 502 TCTTTGG CTCTT 2107 2129GGAACAGCTCTTTGGAC 80 GGAACAGCTCTTTGGA 503 ACTAGG CACT 2152 2174AAAAATTTAACACCCAT 81 AAAAATTTAACACCCA 504 AGTAGG TACT 2247 2269CTGAACTCCTCACACCC 82 CTGAACTCCTCACACC 505 AATTGG CAAT 2414 2436CCTCACTGTCAACCCAA 83 CCTCACTGTCAACCCA 506 CACAGG ACAC 2427 2449CCAACACAGGCATGCTC 84 CCAACACAGGCATGCT 507 ATAAGG CATA 2432 2454ACAGGCATGCTCATAAG 85 ACAGGCATGCTCATAA 508 GAAAGG GGAA 2449 2471GAAAGGTTAAAAAAAG 86 GAAAGGTTAAAAAAA 509 TAAAAGG GTAAA 2456 2478TAAAAAAAGTAAAAGG 87 TAAAAAAAGTAAAAG 510 AACTCGG GAACT 2515 2537TCTAGCATCACCAGTAT 88 TCTAGCATCACCAGTA 511 TAGAGG TTAG 2546 2568GCCCAGTGACACATGTT 89 GCCCAGTGACACATGT 512 TAACGG TTAA 2552 2574TGACACATGTTTAACGG 90 TGACACATGTTTAACG 513 CCGCGG GCCG 2571 2593GCGGTACCCTAACCGTG 91 GCGGTACCCTAACCGT 514 CAAAGG GCAA 2599 2621TAATCACTTGTTCCTTA 92 TAATCACTTGTTCCTT 515 AATAGG AAAT 2600 2622AATCACTTGTTCCTTAA 93 AATCACTTGTTCCTTA 516 ATAGGG AATA 2614 2636TAAATAGGGACCTGTAT 94 TAAATAGGGACCTGTA 517 GAATGG TGAA 2624 2646CCTGTATGAATGGCTCC 95 CCTGTATGAATGGCTC 518 ACGAGG CACG 2625 2647CTGTATGAATGGCTCCA 96 CTGTATGAATGGCTCC 519 CGAGGG ACGA 2676 2698AAATTGACCTGCCCGTG 97 AAATTGACCTGCCCGT 520 AAGAGG GAAG 2679 2701TTGACCTGCCCGTGAAG 98 TTGACCTGCCCGTGAA 521 AGGCGG GAGG 2680 2702TGACCTGCCCGTGAAGA 99 TGACCTGCCCGTGAAG 522 GGCGGG AGGC 2711 2733AGCAAGACGAGAAGAC 100 AGCAAGACGAGAAGA 523 CCTATGG CCCTA 2755 2777ACAGTACCTAACAAACC 101 ACAGTACCTAACAAAC 524 CACAGG CCAC 2789 2811CAAACCTGCATTAAAAA 102 CAAACCTGCATTAAAA 525 TTTCGG ATTT 2793 2815CCTGCATTAAAAATTTC 103 CCTGCATTAAAAATTT 526 GGTTGG CGGT 2794 2816CTGCATTAAAAATTTCG 104 CTGCATTAAAAATTTC 527 GTTGGG GGTT 2795 2817TGCATTAAAAATTTCGG 105 TGCATTAAAAATTTCG 528 TTGGGG GTTG 2804 2826AATTTCGGTTGGGGCGA 106 AATTTCGGTTGGGGCG 529 CCTCGG ACCT 2895 2917TGATCCAATAACTTGAC 107 TGATCCAATAACTTGA 530 CAACGG CCAA 2911 2933CCAACGGAACAAGTTAC 108 CCAACGGAACAAGTTA 531 CCTAGG CCCT 2912 2934CAACGGAACAAGTTACC 109 CAACGGAACAAGTTAC 532 CTAGGG CCTA 2954 2976CTAGAGTCCATATCAAC 110 CTAGAGTCCATATCAA 533 AATAGG CAAT 2955 7977TAGAGTCCATATCAACA 111 TAGAGTCCATATCAAC 534 ATAGGG AATA 2974 2996AGGGTTTACGACCTCGA 112 AGGGTTTACGACCTCG 535 TGTTGG ATGT 2980 3002TACGACCTCGATGTTGG 113 TACGACCTCGATGTTG 536 ATCAGG GATC 2992 3014GTTGGATCAGGACATCC 114 GTTGGATCAGGACATC 537 CGATGG CCGA 3010 3032GATGGTGCAGCCGCTAT 115 GATGGTGCAGCCGCTA 538 TAAAGG TTAA 3058 3080TACGTGATCTGAGTTCA 116 TACGTGATCTGAGTTC 539 GACCGG AGAC 3069 3091AGTTCAGACCGGAGTAA 117 AGTTCAGACCGGAGTA 540 TCCAGG ATCC 3073 3095CAGACCGGAGTAATCCA 118 CAGACCGGAGTAATCC 541 GGTCGG AGGT 3110 3132CAAATTCCTCCCTGTAC 119 CAAATTCCTCCCTGTA 542 GAAAGG CGAA 3125 3147ACGAAAGGACAAGAGA 120 ACGAAAGGACAAGAG 543 AATAAGG AAATA 3203 3225ACCCACACCCACCCAAG 121 ACCCACACCCACCCAA 544 AACAGG GAAC 3204 3226CCCACACCCACCCAAGA 122 CCCACACCCACCCAAG 545 ACAGGG AACA 3217 3239AAGAACAGGGTTTGTTA 123 AAGAACAGGGTTTGTT 546 AGATGG AAGA 3227 3249TTTGTTAAGATGGCAGA 124 TTTGTTAAGATGGCAG 547 GCCCGG AGCC 3262 3284ACTTAAAACTTTACAGT 125 ACTTAAAACTTTACAG 548 CAGAGG TCAG 3294 3316TCTTCTTAACAACATAC 126 TCTTCTTAACAACATA 549 CCATGG CCCA 3336 3358TGTACCCATTCTAATCG 127 TGTACCCATTCTAATC 550 CAATGG GCAA 3370 3392CTTACCGAACGAAAAAT 128 CTTACCGAACGAAAAA 551 TCTAGG TTCT 3391 3413GGCTATATACAACTACG 129 GGCTATATACAACTAC 552 CAAAGG GCAA 3406 3428CGCAAAGGCCCCAACGT 130 CGCAAAGGCCCCAAC 553 TGTAGG GTTGT 3415 3437CCCAACGTTGTAGGCCC 131 CCCAACGTTGTAGGCC 554 CTACGG CCTA 3416 3438CCAACGTTGTAGGCCCC 132 CCAACGTTGTAGGCCC 555 TACGGG CTAC 3570 3592CCTCCCCATACCCAACC 133 CCTCCCCATACCCAAC 556 CCCTGG CCCC 3586 3608CCCCTGGTCAACCTCAA 134 CCCCTGGTCAACCTCA 557 CCTAGG ACCT 3643 3665GTTTACTCAATCCTCTG 135 GTTTACTCAATCCTCT 558 ATCAGG GATC 3644 3666TTTACTCAATCCTCTGA 136 TTTACTCAATCCTCTG 559 TCAGGG ATCA 3676 3698AACTCAAACTACGCCCT 137 AACTCAAACTACGCCC 560 GATCGG TGAT 3757 3779CTATCAACATTACTAAT 138 CTATCAACATTACTAA 561 AAGTGG TAAG 3828 3850ACTCCTGCCATCATGAC 139 ACTCCTGCCATCATGA 562 CCTTGG CCCT 3892 3914ACCCCCTTCGACCTTGC 140 ACCCCCTTCGACCTTG 563 CGAAGG CCGA 3893 3915CCCCCTTCGACCTTGCC 141 CCCCCTTCGACCTTGC 564 GAAGGG CGAA 3894 3916CCCCTTCGACCTTGCCG 142 CCCCTTCGACCTTGCC 565 AAGGGG GAAG 3913 3935GGGGAGTCCGAACTAGT 143 GGGGAGTCCGAACTA 566 CTCAGG GTCTC 3937 3959TTCAACATCGAATACGC 144 TTCAACATCGAATACG 567 CGCAGG CCGC 4015 4037CTCACCACTACAATCTT 145 CTCACCACTACAATCT 568 CCTAGG TCCT 4287 4309ACTTTGATAGAGTAAAT 146 ACTTTGATAGAGTAAA 569 AATAGG TAAT 4311 4333GCTTAAACCCCCTTATT 147 GCTTAAACCCCCTTAT 570 TCTAGG TTCT 4386 4408TCACACCCCATCCTAAA 148 TCACACCCCATCCTAA 571 GTAAGG AGTA 4406 4428AGGTCAGCTAAATAAGC 149 AGGTCAGCTAAATAAG 572 TATCGG CTAT 4407 4429GGTCAGCTAAATAAGCT 150 GGTCAGCTAAATAAGC 573 ATCGGG TATC 4428 4450GGCCCATACCCCGAAAA 151 GGCCCATACCCCGAAA 574 TGTTGG ATGT 4460 4482TCCCGTACTAATTAATC 152 TCCCGTACTAATTAAT 575 CCCTGG CCCC 4494 4516ATCTACTCTACCATCTTT 153 ATCTACTCTACCATCT 576 GCAGG TTGC 4542 4564CACTGATTTTTTACCTG 154 CACTGATTTTTTACCT 577 AGTAGG GAGT 4692 4714CTCTTCAACAATATACT 155 CTCTTCAACAATATAC 578 CTCCGG TCTC 4767 4789ATAGCTATAGCAATAAA 156 ATAGCTATAGCAATAA 579 ACTAGG AACT 4799 4821CTTTCACTTCTGAGTCC 157 CTTTCACTTCTGAGTC 580 CAGAGG CCAG 4809 4831TGAGTCCCAGAGGTTAC 158 TGAGTCCCAGAGGTTA 581 CCAAGG CCCA 4827 4849CAAGGCACCCCTCTGAC 159  CAAGGCACCCCTCTGA 582 ATCCGG CATC 4941 4963TCAATCTTATCCATCAT 160 TCAATCTTATCCATCA 583 AGCAGG TAGC 4950 4972TCCATCATAGCAGGCAG 161 TCCATCATAGCAGGCA 584 TTGAGG GTTG 4953 4975ATCATAGCAGGCAGTTG 162 ATCATAGCAGGCAGTT 585 AGGTGG GAGG 5010 5032TACTCCTCAATTACCCA 163 TACTCCTCAATTACCC 586 CATAGG ACAT 5202 5274CCATCCACCCTCCTCTC 164 CCATCCACCCTCCTCT 587 CCTAGG CCCT 5205 5227TCCACCCTCCTCTCCCT 165 TCCACCCTCCTCTCCCT 588 AGGAGG AGG 5223 5245GGAGGCCTGCCCCCGCT 166 GGAGGCCTGCCCCCGC 589 AACCGG TAAC 539 5261TAACCGGCTTTTTGCCC 167 TAACCGGCTTTTTGCC 590 AAATGG CAAA 5240 5262AACCGGCTTTTTGCCCA 168 AACCGGCTTTTTGCCC 591 AATGGG AAAT 5500 5522TAATAATCTTATAGAAA 169 TAATAATCTTATAGAA 592 TTTAGG ATTT 5569 5591CTTAATTTCTGTAACAG 170 CTTAATTTCTGTAACA 593 CTAAGG GCTA 5646 5668CTAAGCCCTTACTAGAC 171 CTAAGCCCTTACTAGA 594 CAATGG CCAA 5647 5669TAAGCCCTTACTAGACC 172 TAAGCCCTTACTAGAC 595 AATGGG CAAT 5697 5719AGCTAAGCACCCTAATC 173 AGCTAAGCACCCTAAT 596 AACTGG CAAC 5723 5745CAATCTACTTCTCCCGC 174 CAATCTACTTCTCCCG 597 CGCCGG CCGC 5724 5746AATCTACTTCTCCCGCC 175 AATCTACTTCTCCCGC 598 GCCGGG CGCC 5732 5754TCTCCCGCCGCCGGGAA 176 TCTCCCGCCGCCGGGA 599 AAAAGG AAAA 5735 5757CCCGCCGCCGGGAAAA 177 CCCGCCGCCGGGAAA 600 AAGGCGG AAAGG 5736 5758CCGCCGCCGGGAAAAA 178 CCGCCGCCGGGAAAA 601 AGGCGGG AAGGC 5747 5769AAAAAAGGCGGGAGAA 179 AAAAAAGGCGGGAGA 602 GCCCCGG AGCCC 5751 5773AAGGCGGGAGAAGCCC 180 AAGGCGGGAGAAGCC 603 CGGCAGG CCGGC 5800 5822ATTCAATATGAAAATCA 181 ATTCAATATGAAAATC 604 CCTCGG ACCT 5806 5828TATGAAAATCACCTCGG 182 TATGAAAATCACCTCG 605 ACTCTGG GAGC 5816 5838ACCTCGGAGCTGGTAAA 183 ACCTCGGAGCTGGTAA 606 AAGAGG AAAG 5928 5950TCTACAAACCACAAAGA 184 TCTACAAACCACAAAG 607 CATTGG ACAT 5949 5971GGAACACTATACCTATT 185 GGAACACTATACCTAT 608 ATTCGG TATT 5961 5983CTATTATTCGGCGCATG 186 CTATTATTCGGCGCAT 609 AGCTGG GAGC 5970 5992GGCGCATGAGCTGGAGT 187 GGCGCATGAGCTGGA 610 CCTAGG GTCCT 6005 6027CCTCCTTATTCGAGCCG 188 CCTCCTTATTCGAGCC 611 AGCTGG GAGC 6006 6028CTCCTTATTCGAGCCGA 189 CTCCTTATTCGAGCCG 612 GCTGGG AGCT 6027 6049GGCCAGCCAGGCAACCT 190 GGCCAGCCAGGCAAC 613 TCTAGG CTTCT 6108 3130ATAGTAATACCCATCAT 191 ATAGTAATACCCATCA 614 AATCGG TAAT 6111 6133GTAATACCCATCATAAT 192 GTAATACCCATCATAA 615 CGGAGG TCGG 6117 6139CCCATCATAATCGGAGG 193 CCCATCATAATCGGAG 616 CTTTGG GCTT 6144 6166TGACTAGTTCCCCTAAT 194 TGACTAGTTCCCCTAA 617 AATCGG TAAT 6158 6180AATAATCGGTGCCCCCG 195 AATAATCGGTGCCCCC 618 ATATGG GATA 6236 6258CCTGCTCGCATCTGCTA 196 CCTGCTCGCATCTGCT 619 TAGTGG ATAG 6239 6261GCTCGCATCTGCTATAG 197 GCTCGCATCTGCTATA 620 TGGAGG GTGG 6243 6265GCATCTGCTATAGTGGA 198 GCATCTGCTATAGTGG 621 GGCCGG AGGC 6249 6271GCTATAGTGGAGGCCGG 199 GCTATAGTGGAGGCCG 622 AGCAGG GAGC 6255 6277GTGGAGGCCGGAGCAG 200 GTGGAGGCCGGAGCA 623 GAACAGG GGAAC 6282 6304ACAGTCTACCCTCCCTT 201 ACAGTCTACCCTCCC 624 AGCAGG TAGC 6283 6305CAGTCTACCCTCCCTTA 202 CAGTCTACCCTCCCTT 625 GCAGGG AGCA 6300 6322GCAGGGAACTACTCCCA 203 GCAGGGAACTACTCCC 626 CCCTGG ACCC 6342 6364ATCTTCTCCTTACACCT 204 ATCTTCTCCTTACACCT 627 AGCAGG AGC 6360 6382GCAGGTGTCTCCTCTAT 205 GCAGGTGTCTCCTCTA 628 CTTAGG TCTT 6361 6383CAGGTGTCTCCTCTATC 206 CAGGTGTCTCCTCTAT 629 TTAGGG CTTA 6362 6384AGGTGTCTCCTCTATCT 207 AGGTGTCTCCTCTATC 630 TAGGGG TTAG 6495 6517TCTCTCCCAGTCCTAGC 208 TCTCTCCCAGTCCTAG 631 TGCTGG CTGC 6552 6574ACCACCTTCTTCGACCC 209 ACCACCTTCTTCGACC 632 CGCCGG CCGC 6555 6577ACCTTCTTCGACCCCGC 210 ACCTTCTTCGACCCCG 633 CGGAGG CCGG 6558 6580TTCTTCGACCCCGCCGG 211 TTCTTCGACCCCGCCG 634 AGGAGG GAGG 6597 6619CAACACCTATTCTGATT 212 CAACACCTATTCTGAT 635 TTTCGG TTTT 6630 6652GTTTATATTCTTATCCTA 213 GTTTATATTCTTATCCT 636 CCAGG ACC 6636 6658ATTCTTATCCTACCAGG 214 ATTCTTATCCTACCAG 637 CTTCGG GCTT 6669 6691CATATTGTAACTTACTA 215 CATATTGTAACTTACT 638 CTCCGG ACTC 6687 6709TCCGGAAAAAAAGAAC 216 TCCGGAAAAAAAGAA 639 CATTTGG CCATT 6696 6718AAAGAACCATTTGGATA 217 AAAGAACCATTTGGAT 640 CATAGG ACAT 6701 6723ACCATTTGGATACATAG 218 ACCATTTGGATACATA 641 GTATGG GGTA 6723 6745GTCTGAGCTATGATATC 219 GTCTGAGCTATGATAT 642 AATTGG CAAT 6732 6754ATGATATCAATTGGCTT 220 ATGATATCAATTGGCT 643 CCTAGG TCCT 6713 6755TGATATCAATTGGCTTC 221 TGATATCAATTGGCTT 644 CTAGGG CCTA 6768 6790GCACACCATATATTTAC 222 GCACACCATATATTTrA 645 AGTAGG CAGT 6831 6853ATAATCATCGCTATCCC 223 ATAATCATCGCTATCC 646 CACCGG CCAC 6867 6889AGCTGACTCGCCACACT 224 AGCTGACTCGCCACAC 647 CCACGG TCCA 6909 6931GCTGCAGTGCTCTGAGC 225 GCTGCAGTGCTCTGAG 648 CCTAGG CCCT 6933 6955TTCATCTTTCTTTTCACC 226 TTCATCTTTCTTTTCAC 649 GTAGG CGT 6936 6958ATCTTTCTTTTCACCGTA 227 ATCTTTCTTTTCACCGT 650 GGTGG AGG 6945 6967TTCACCGTAGGTGGCCT 228 TTCACCGTAGGTGGCC 651 GACTGG TGAC 7032 7054TTCCACTATGTCCTATC 229 TTCCACTATGTCCTAT 652 AATAGG CAAT 7053 7075GGAGCTGTATTTGCCAT 230 GGAGCTGTATTTGCCA 653 CATAGG TCAT 7056 7078GCTGTATTTGCCATCAT 231 GCTGTATTTGCCATCA 654 AGGAGG TAGG 7086 7108CACTGATTTCCCCTATT 232 CACTGATTTCCCCTAT 655 CTCAGG TCTC 7140 7162CATTTCACTATCATATT 233 CATTTCACTATCATAT 656 CATCGG TCAT 7176 7198TTCTTCCCACAACACTT 234 TTCTTCCCACAACACT 657 TCTCGG TTCT 7185 7207CAACACTTTCTCGGCCT 235 CAACACTTTCTCGGCC 658 ATCCGG TATC 7205 7227CGGAATGCCCCGACGTT 236 CGGAATGCCCCGACGT 659 ACTCGG TACT 7251 7273TGAAACATCCTATCATC 237 TGAAACATCCTATCAT 660 TGTAGG CTGT 7358 7380AGAAGAACCCTCCATAA 238 AGAAGAACCCTCCATA 661 ACCTGG AACC 7371 7393ATAAACCTGGAGTGACT 239 ATAAACCTGGAGTGAC 662 ATATGG TATA 7432 7454ACATAAAATCTAGACAA 240 ACATAAAATCTAGACA 663 AAAAGG AAAA 7436 7458AAAATCTAGACAAAAA 241 AAAATCTAGACAAAA 664 AGGAAGG AAGGA 7457 7479GGAATCGAACCCCCCAA 242 GGAATCGAACCCCCCA 665 AGCTGG AAGC 7476 7498CTGGTTTCAAGCCAACC 243 CTGGTTTCAAGCCAAC 666 CCATGG CCCA 7499 7521CCTCCATGACTTTTTCA 244 CCTCCATGACTTTTTC 667 AAAAGG AAAA 7544 7566CTTTGTCAAAGTTAAAT 245 CTTTGTCAAAGTTAAA 668 TATAGG TTAT 7567 7589CTAAATCCTATATATCT 246 CTAAATCCTATATATC 669 TAATGG TTAA 7586 7608ATGGCACATGCAGCGCA 247 ATGGCACATGCAGCGC 670 AGTAGG AAGT 7741 7763TACTAACATCTCAGACG 248 TACTAACATCTCAGAC 671 CTCAGG GCTC 7831 7853CATCCTTTACATAACAG 249 CATCCTTTACATAACA 672 ACGAGG GACG 7865 7887TCCCTTACCATCAAATC 250 TCCCTTACCATCAAAT 673 AATTGG CAAT 7875 7897TCAAATCAATTGGCCAC 251 TCAAATCAATTGGCCA 674 CAATGG CCAA 7904 7926ACCTACGAGTACACCGA 252 ACCTACGAGTACACCG 675 CTACGG ACTA 7907 7929TACGAGTACACCGACTA 253 TACGAGTACACCGACT 676 CGGCGG ACGG 7955 7977CCCCCATTATTCCTAGA 254 CCCCCATTTTCCTAG 677 ACCAGG AACC 8069 8091TCATGAGCTGTCCCCAC 255 TCATGAGCTGTCCCCA 678 ATTAGG CATT 8093 8115TTAAAAACAGATGCAAT 256 TTAAAAACAGATGCAA 679 TCCCGG TTCC 8131 8153CACTTTCACCGCTACAC 257 CACTTTCACCGCTACA 680 GACCGG CGAC 8132 8154ACTTTCACCGCTACACG 258 ACTTTCACCGCTACAC 681 ACCGGG GACC 8133 8155CTTTCACCGCTACACGA 259 CTTTCACCGCTACACG 682 CCGGGG ACCG 8134 8156TTTCACCGCTACACGAC 260 TTTCACCGCTACACGA 683 CGGGGG CCGG 8144 8166ACACGACCGGGGGTAT 261 ACACGACCGGGGGTAT 684 ACTACGG ACTA 8165 8187GGTCAATGCTCTGAAAT 262 GGTCAATGCTCTGAAA 685 CTGTGG TCTG 8228 8250CCCCTAAAAATCTTTGA 263 CCCCTAAAAATCTTTG 686 AATAGG AAAT 8229 8251CCCTAAAAATCTTTGAA 264 CCCTAAAAATCTTTGA 687 ATAGGG AATA 8370 8392CCCAACTAAATACTACC 265 CCCAACTAAATACTAC 688 GTATGG CGTA 8551 8573TTCATTGCCCCCACAAT 266 TTCATTTGCCCCCACAA 689 CCTAGG TCCT 8698 8720ATAACCATACACAACAC 267 ATAACCATACACAACA 690 TAAAGG CTAA 8761 8783ATTGCCACAACTAACCT 268 ATTGCCACAACTAACC 691 CCTCGG TCCT 8817 8839ACTATCTATAAACCTAG 269 ACTATCTATAAACCTA 692 CCATGG GCCA 8835 8857CATGGCCATCCCCTTAT 270 CATGGCCATCCCCTTA 693 GAGCGG TGAG 8836 8858ATGGCCATCCCCTTATG 271 ATGGCCATCCCCTTAT 694 AGCGGG GAGC 8851 8873TGAGCGGGCACAGTGAT 272 TGAGCGGGCACAGTG 695 TATAGG ATTAT 8899 8921CTAGCCCACTTCTTACC 273 CTAGCCCACTTCTTAC 696 ACAAGG CACA 8973 8995ACTCATTCAACCAATAG 274 ACTCATTCAACCAATA 697 CCCTGG GCCC 9004 9026CTAACCGCTAACATTACC 275 CTAACCGCTAACATTA 698 TGCAGG CTGC 9028 9050CACCTACTCATGCACCT 276 CACCTACTCATGCACC 699 AATTGG TAAT 9243 9265CCCAGCCCATGACCCCT 277 CCCAGCCCATGACCCC 700 AACAGG TAAC 9244 9266CCAGCCCATGACCCCTA 278 CCAGCCCATGACCCCT 701 ACAGGG AACA 9245 9267CAGCCCATGACCCCTAA 279 CAGCCCATGACCCCTA 702 CAGGGG ACAG 9273 9295TCAGCCCTCCTAATGAC 280 TCAGCCCTCCTAATGA 703 CTCCGG CCTC 9321 9343TCCATAACGCTCCTCAT 281 TCCATAACGCTCCTCA 704 ACTAGG TACT 9358 9380CACTAACCATATACCAA 282 CACTAACCATATACCA 705 TGATGG ATGA 9390 9412ACACGAGAAAGCACAT 283 ACACGAGAAAGCACA 706 ACCAAGG TACCA 9417 9439CACACACCACCTGTCCA 284 CACACACCACCTGTCC 707 AAAAGG AAAA 9429 9451GTCCAAAAAGGCCTTCG 285 GTCCAAAAAGGCCTTC 708 ATACGG GATA 9430 9452TCCAAAAAGGCCTTCGA 286 TCCAAAAAGGCCTTCG 709 TACGGG ATAC 9471 9493TCAGAAGTTTTTTTCTTC 287 TCAGAAGTTTTTTTCTT 710 GCAGG CGC 9522 9544CTAGCCCCTACCCCCCA 288 CTAGCCCCTACCCCCC 711 ATTAGG AATT 9525 9547GCCCCTACCCCCCAATT 289 GCCCCTACCCCCCAAT 712 AGGAGG TAGG 9526 9548CCCCTACCCCCCAATTA 290 CCCCTACCCCCCAATT 713 GGAGGG AGGA 9532 9554CCCCCCAATTAGGAGGG 291 CCCCCCAATTAGGAGG 714 CACTGG GCAC 9543 9565GGAGGGCACTGGCCCCC 292 GGAGGGCCACTGGCCCC 715 AACAGG CAAC 9606 9628ACATCCGTATTACTCGC 293 ACATCCGTATTACTCG 716 ATCAGG CATC 9692 9714ACTGCTTATTACAATTT 294 ACTGCTTATTACAATT 717 TACTGG TTAC 9693 9715CTGCTTATTACAATTTT 295 CTGCTTATTACAATTTT 718 ACTGGG ACT 9756 9778TCTCCCTTCACCATTTCC 296 TCTCCCTTCACCATTTC 719 GACGG CGA 9765 9787ACCATTTCCGACGGCAT 297 ACCATTTCCGACGGCA 720 CTACGG TCTA 9789 9811TCAACATTTTTTGTAGC 298 TCAACATTTTTTGTAG 721 CACAGG CCAC 9798 9820TTTGTAGCCACAGGCTT 299 TTTGTAGCCACAGGCT 722 CCACGG TCCA 9816 9838CACGGACTTCACGTCAT 300 CACGGACTTCACGTCA 723 TATTGG TTAT 9885 9907TTTACATCCAAACATCA 301 TTTACATCCAAACATC 724 CTTTGG ACTT 9910 9932TCGAAGCCGCCGCCTGA 302 TCGAAGCCGCCGCCTG 725 TACTGG ATAC 9926 9948ATACTGGCATTTTGTAG 303 ATACTGGCATTTTGTA 726 ATGTGG GATG 9963 9985TATGTCTCCATCTATTG 304 TATGTCTCCATCTATT 727 ATGAGG GATG 9964 9986ATGTCTCCATCTATTGA 305 ATGTCTCCATCTATTG 728 TGAGGG ATGA 10122 10144TTTTGACTACCACAACT 306 TTTTGACTACCACAAC 729 CAACGG TCAA 10155 10177AAATCCACCCCTTACGA 307 AAATCCACCCCTTACG 730 GTGCGG AGTG 10343 10365CATCATCCTAGCCCTAA 308 CATCATCCTAGCCCTA 731 GTCTGG AGTC 10365 10387GCCTATGAGTGACTACA 309 GCCTATGAGTGACTAC 732 AAAAGG AAAA 10385 10407AGGATTAGACTGAACCG 310 AGGATTAGACTGAACC 733 AATTGG GAAT 10500 10522GCATTTACCATCTCACT 311 GCATTTACCATCTCAC 734 TCTAGG TTCT 10551 10573TCCTCCCTACTATGCCT 312 TCCTCCCTACTATGCC 735 AGAAGG TAGA 10664 10686CTTTGCCGCCTGCGAAG 313 CTTTGCCGCCTGCGAA 736 CAGCGG GCAG 10667 10689TGCCGCCTGCGAAGCAG 314 TGCCGCCTGCGAAGCA 737 CGGTGG GCGG 10668 10690GCCGCCTGCGAAGCAGC 315 GCCGCCTGCGAAGCAG 738 GGTGGG CGGT 10704 10726GTCTCAATCTCCAACAC 316 GTCTCAATCTCCAACA 739 ATATGG CATA 10972 10994ACTCCTACCCCTCACAA 317 ACTCCTACCCCTCACA 740 TCATGG ATCA 11128 11150AACCACACTTATCCCCA 318 AACCACACTTATCCCC 741 CCTTGG ACCT 11147 11169TTGGCTATCATCACCCG 319 TTGGCTATCATCACCC 742 ATGAGG GATG 11174 11196CAGCCAGAACGCCTGA 320 CAGCCAGAACGCCTGA 743 ACGCAGG ACGC 11204 11226TTCCTATTCTACACCCT 321 TTCCTATTCTACACCCT 744 AGTAGG AGT 11252 11274ATTTACACTCACAACAC 322 ATTTACACTCACAACA 745 CCTAGG CCCT 11369 11391ATAGTAAAGATACCTCT 323 ATAGTAAAGATACCTC 746 TTACGG TTTA 11417 11439CATGTCGAAGCCCCCAT 324 CATGTCGAAGCCCCCA 747 CGCTGG TCGC 11418 11440ATGTCGAAGCCCCCATC 325 ATGTCGAAGCCCCCAT 748 GCTGGG CGCT 11453 11475GCCGCAGTACTCTTAAA 326 GCCGCAGTACTCTTAA 749 ACTAGG AACT 11456 11478GCAGTACTCTTAAAACT 327 GCAGTACTCTTAAAAC 750 AGGCGG TAGG 11462 11484CTCTTAAAACTAGGCGG 328 CTCTTAAAACTAGGCG 751 CTATGG GCTA 11540 11562TTCCTTGTACTATCCCTA 329 TTCCTTGTACTATCCCT 752 TGAGG ATG 11669 11691CAAACCCCCTGAAGCTT 330 CAAACCCCCTGAAGCT 753 CACCGG TCAC 11696 11718GTCATTCTCATAATCGC 331 GTCATTCTCATAATCG 754 CCACGG CCCA 11697 11719TCATTCTCATAATCGCC 332 TCATTCTCATAATCGC 755 CACGGG CCAC 11777 11799CGCATCATAATCCTCTC 333 CGCATCATAATCCTCT 756 TCAAGG CTCA 11866 11888ACCCCCCACTATTAACC 334 ACCCCCACTATTAAC 757 TACTGG CTAC 11867 11889CCCCCCACTATTAACCT 335 CCCCCCACTATTAACC 758 ACTGGG TACT 11927 11949AATATCACTCTCCTACT 336 AATATCACTCTCCTAC 759 TACAGG TTAC 11985 12007ACATATTTACCACAACA 337 ACATATTTACCACAAC 760 CAATGG ACAA 11986 12008CATATTTACCACAACAC 338 CATATTTACCACAACA 761 AATGGG CAAT 11987 12009ATATTTACCACAACACA 339 ATATTTACCACAACAC 762 ATGGGG AATG 12104 12126CTCAACCCCGACATCAT 340 CTCAACCCCGACATCA 763 TACCGG TTAC 12105 12127TCAACCCCGACATCATT 341 TCAACCCCGACATCAT 764 ACCGGG TACC 12164 12186GATTGTGAATCTGACAA 342 GATTGTGAATCTGACA 765 CAGAGG ACAG 12235 12257TGCCCCCATGTCTAACA 343 TGCCCCCATGTCTAAC 766 ACATGG AACA 12254 12276ATGGCTTTCTCAACTTTT 344 ATGGCTTTCTCAACTT 767 AAAGG TTAA 12272 12294AAAGGATAACAGCTATC 345 AAAGGATAACAGCTAT 768 CATTGG CCAT 12279 12301AACAGCTATCCATTGGT 346 AACAGCTATCCATTGG 769 CTTAGG TCTT 12294 12316GTCTTAGGCCCCAAAAA 347 GTCTTAGGCCCCAAAA 770 TTTTGG ATTT 12608 12630CTGTAGCATTGTTCGTT 348 CTGTAGCATTGTTCGT 771 ACATGG TACA 12742 12764AACCTATTCCAACTGTT 349 AACCTATTCCAACTGT 772 CATCGG TCAT 12750 12772CCAACTGTTCATCGGCT 350 CCAACTGTTCATCGGC 773 GAGAGG TGAG 12751 12773CAACTGTTCATCGGCTG 351 CAACTGTTCATCGGCT 774 AGAGGG GAGA 12757 12779TTCATCGGCTGAGAGGG 352 TTCATCGGCTGAGAGG 775 CGTAGG GCGT 12847 12869GCAATCCTATACAACCG 353 GCAATCCTATACAACC 776 TATCGG GTAT 12856 12878TACAACCGTATCGGCGA 354 TACAACCGTATCGGCG 777 TATCGG ATAT 12958 12980CCAAGCCTCACCCCACT 355 CCAAGCCTCACCCCAC 778 ACTAGG TACT 12979 13001GGCCTCCTCCTAGCAGC 356 GGCCTCCTCCTAGCAG 779 AGCAGG CAGC 12997 13019GCAGGCAAATCAGCCC 357 GCAGGCAAATCAGCCC 780 AATTAGG AATT 13030 13052TGACTCCCCTCAGCCAT 358 TGACTCCCCTCAGCCA 781 AGAAGG TAGA 13081 13103TCAAGCACTATAGTTGT 359 TCAAGCACTATAGTTG 782 AGCAGG TAGC 13156 13178CAAACTCTAACACTATG 360 CAAACTCTAACACTAT 783 CTTAGG GCTT 13246 13268TTCTCCACTTCAAGTCA 361 TTCTCCACTTCAAGTC 784 ACTAGG AACT 13267 13289GGACTCATAATAGTTAC 362 GGACTCATAATAGTTA 785 AATCGG CAAT 13345 13367GCCATACTATTTATGTG 363 GCCATACTATTTATGT 786 CTCCGG GCTC 13346 13368CCATACTATTTATGTGC 364 CCATACTATTTATGTG 787 TCCGGG CTCC 13393 13415GAACAAGATATTCGAA 365 GAACAAGATATTCGAA 788 AAATAGG AAAT 13396 13418CAAGATATTCGAAAAAT 366 CAAGATATTCGAAAAA 789 AGGAGG TAGG 13441 13463ACTTCAACCTCCCTCAC 367 ACTTCAACCTCCCTCA 790 CATTGG CCAT 13459 13481ATTGGCAGCCTAGCATT 368 ATTGGCAGCCTAGCAT 791 AGCAGG TAGC 13477 13499GCAGGAATACCTTTCCT 369 GCAGGAATACCTTTCC 792 CACAGG TCAC 13612 13634ATAATTCTTCTCACCCT 370 ATAATTCTTCTCACCC 793 AACAGG TAAC 13686 13708ACTAAACCCCATTAAAC 371 ACTAAACCCCATTAAA 794 GCCTGG CGCC 13693 13715CCCATTAAACGCCTGGC 372 CCCATTAAACGCCTGG 795 AGCCGG CAGC 13708 13730GCAGCCGGAAGCCTATT 373 GCAGCCGGAAGCCTAT 796 CGCAGG TCGC 13804 13826GCCCTCGCTGTCACTTT 374 GCCCTCGCTGTCACTT 797 CCTAGG TCCT 13894 13916TTTTATTTCTCCAACATA 375 TTTTATTTCTCCAACAT 798 CTCGG ACT 13936 13958CACCGCACAATCCCCTA 376 CACCGCACAATCCCCT 799 TCTAGG ATCT 14059 14081ATCATCACCTCAACCCA 377 ATCATCACCTCAACCC 800 AAAAGG AAAA 14237 14259TACAAAGCCCCCGCACC 378 TACAAAGCCCCCGCAC 801 AATAGG CAAT 14417 14439ACCCCTGACCCCCATGC 379 ACCCCTGACCCCCATG 802 CTCAGG CCTC 14579 14601AATACTAAACCCCCATA 380 AATACTAAACCCCCAT 803 AATAGG AAAT 14585 14607AAACCCCCATAAATAGG 381 AAACCCCCATAAATAG 804 AGAAGG GAGA 14664 14686CATACATCATTATTCTC 382 CATACATCATTATTCT 805 GCACGG CGCA 14825 14847ATCTCCGCATGATGAAA 383 ATCTCCGCATGATGAA 806 CTTCGG ACTT 14837 14859TGAAACTTCGGCTCACT 384 TGAAACTTCGGCTCAC 807 CCTTGG TCCT 14867 14889CTGATCCTCCAAATCAC 385 CTGATCCTCCAAATCA 808 CACAGG CCAC 14951 14973ATCACTCGAGACGTAAA 386 ATCACTCGAGACGTAA 809 TTATGG ATTA 14981 15003ATCCGCTACCTTCACGC 387 ATCCGCTACCTTCACG 810 CAATGG CCAA 15020 15042ATCTGCCTCTTCCTACA 388 ATCTGCCTCTTCCTAC 811 CATCGG ACAT 15021 15043TCTGCCTCTTCCTCACA 389 TCTGCCTCTTCCTACA 812 ATCGGG CATC 15026 15048CTCTTCCTACACATCGG 390 CTCTTCCTACACATCG 813 GCGAGG GGCG 15038 15060ATCGGGCGAGGCCTATA 391 ATCGGGCGAGGCCTAT 814 TTACGG ATTA 15071 15093TACTCAGAAACCTGAAA 392 TACTCAGAAACCTGAA 815 CATCGG ACAT 15113 15135ACTATAGCAACAGCCTT 393 ACTATAGCAACAGCCT 816 CATAGG TCAT 15131 15153ATAGGCTATGTCCTCCC 394 ATAGGCTATGTCCTCC 817 GTGAGG CGTG 15149 15171TGAGGCCAAATATCATT 395 TGAGGCCAAATATCAT 818 CTGAGG TCTG 15150 15172GAGGCCAAATATCATTC 396 GAGGCCAAATATCATT 819 TGAGGG CTGA 15151 15173AGGCCAAATATCATTCT 397 AGGCCAAATATCATTC 820 GAGGGG TGAG 15194 15216CTATCCGCCATCCCATA 398 CTATCCGCCATCCCAT 821 CATTGG ACAT 15195 15217TATCCGCCATCCCATAC 399 TATCCGCCATCCCATA 822 ATTGGG CATT 15221 15243GACCTAGTTCAATGAAT 400 GACCTAGTTCAATGAA 823 CTGAGG TCTG 15224 15246CTAGTTCAATGAATCTG 401 CTAGTTCAATGAATCT 824 AGGAGG GAGG 15334 15356CCTCCTATTCTTGCACG 402 CCTCCTATTCTTGCAC 825 AAACGG GAAA 15335 15357CTCCTATTCTTGCACGA 403 CTCCTATTCTTGCACG 826 AACGGG AAAC 15353 15375ACGGGATCAAACAACC 404 ACGGGATCAAACAAC 827 CCCTAGG CCCCT 15416 15438TACACAATCAAAGACGC 405 TACACAATCAAAGACG 828 CCTCGG CCCT 15476 15498CTATTCTCACCAGACCT 406 CTATTCTCACCAGACC 829 CCTAGG TCCT 15590 15612CGATCCGTCCCTAACAA 407 CGATCCGTCCCTAACA 830 ACTAGG AACT 15593 15615TCCGTCCCTAACAAACT 408 TCCGTCCCTAACAAAC 831 AGGAGG TAGG 15740 15762CTCCTCATTCTAACCTG 409 CTCCTCATTCTAACCT 832 AATCGG GAAT 15743 15765CTCATTCTAACCTGAAT 410 CTCATTCTAACCTGAA 833 CGGAGG TCGG 15776 15798AGCTACCCTTTTACCAT 411 AGCTACCCTTTTACCA 834 CATTGG TCAT 15861 15883TTGAAAACAAAATACTC 412 TTGAAAACAAAATACT 835 AAATGG CAAA 15862 15884TGAAAACAAAATACTCA 413 TGAAAACAAAATACTC 836 AATGGG AAAT 15906 15928AATACACCAGTCTTGTA 414 AATACACCAGTCTTGT 837 AACCGG AAAC 15928 15950GAGATGAAAACCTTTTT 415 GAGATGAAAACCTTTT 838 CCAAGG TCCA 16012 16034AACTATTCTCTGTTCTTT 416 AACTTTCTCTGTTCTT 839 CATGG TCA 16013 16035ACTATTCTCTGTTCTTTC 417 ACTATTCTCTGTTCTTT 840 ATGGG CAT 16014 16036CTATTCTCTGTTCTTTCA 418 CTATTCTCTGTTCTTTC 841 TGGGG ATG 16026 16048CTTTCATGGGGAAGCAG 419 CTTTCATGGGGAAGCA 842 ATTTGG GATT 16027 16049TTTCATGGGGAAGCAGA 420 TTTCATGGGGAAGCAG 843 TTTGGG ATTT 16108 16130CAGCCACCATGAATATT 421 CAGCCACCATGAATAT 844 GTACGG TGTA 16252 16274AAAGCCACCCCTCACCC 422 AAAGCCACCCCTCACC 845 ACTAGG CACT 16348 16370CAAATCCCTTCTCGTCC 423 CAAATCCCTTCTCGTC 846 CCATGG CCCA 16367 16389ATGGATGACCCCCCTCA 424 ATGGATGACCCCCCTC 847 GATAGG AGAT 16368 16390TGGATGACCCCCCTCAG 425 TGGATGACCCCCCTCA 848 ATAGGG GATA 16369 16391GGATGACCCCCCTCAGA 426 GGATGACCCCCCTCAG 849 TAGGGG ATAG 16434 16456GAGTGCTACTCTCCTCG 427 GAGTGCTACTCTCCTC 850 CTCCGG GCTC 16435 16457AGTGCTACTCTCCTCGC 428 AGTGCTACTCTCCTCG 851 TCCGGG CTCC 16449 16471CGCTCCGGGCCCATAAC 429 CGCTCCGGGCCCATAA 852 ACTTGG CACT 16450 16472GCTCCGGGCCCATAACA 430 GCTCCGGGCCCATAAC 853 CTTGGG ACTT 16451 16473CTCCGGGCCCATAACAC 431 CTCCGGGCCCATAACA 854 TTGGGG CTTG 16452 16474TCCGGGCCCATAACACT 432 TCCGGGCCCATAACAC 855 TGGGGG TTGG 16482 16504AGTGAACTGTATCCGAC 433 AGTGAACTGTATCCGA 856 ATCTGG CATC 16495 16517CGACATCTGGTTCCTAC 434 CGACATCTGGTTCCTA 857 TTCAGG CTTC 16496 16518GACATCTGGTTCCTACT 435 GACATCTGGTTCCTAC 858 TCAGGG TTCA

TABLE 4 gRNA target sequence for human mtDNA carrying NGG sequence onthe (−) strand. nt sequence on the (+) strand containing CCN Chr endsequence followed by the 20 nt gRNA target Chr start positionreverse complementary sequence position (+ sequence of gRNA target SEQ(will encode the gRNA SEQ (+ strand) strand) sequence ID NO:targeting sequence) ID NO: 17 39 CCCTATTAACCACTCAC 859 GCTCCCGTGAGTGGTT2628 GGGAGC AATA 18 40 CCTATTAACCACTCACG 860 AGCTCCCGTGAGTGGT 2629GGAGCT TAAT 26 48 CCACTCACGGGAGCTCT 861 GCATGGAGAGCTCCCG 2630 CCATGCTGAG 43 65 CCATGCATTTGGTATTT 862 AGACGAAAATACCAA 2631 TCGTCT ATGCA 104126 CCGGAGCACCCTATGTC 863 TACTGCGACATAGGGT 2632 GCAGTA GCTC 112 134CCCTATGTCGCAGTATC 864 AAGACAGATACTGCG 2633 TGTCTT ACATA 113 135CCTATGTCGCAGTATCT 865 AAAGACAGATACTGC 2634 GTCTTT GACAT 140 162CCTGCCTCATCCTATTA 866 GATAAATAATAGGATG 2635 TTTATC AGGC 144 166CCTCATCCTATTATTTAT 867 GTGCGATAAATAATAG 2636 CGCAC GATG 150 172CCTATTATTTATCGCAC 868 ACGTAGGTGCGATAAA 2637 CTACGT TAAT 166 188CCTACGTTCAATATTAC 869 TCGCCTGTAATATTGA 2638 AGGCGA ACGT 261 283CCACTTTCCACACAGAC 870 TATGATGTCTGTGTGG 2639 ATCATA AAAG 268 290CCACACAGACATCATAA 871 TTTTTGTTATGATGTCT 2640 CAAAAA GTG 298 320CCAAACCCCCCCTCCCC 872 GAAGCGGGGGAGGGG 2641 CGCTTC GGGTT 304 326CCCCCCTCCCCCGCTTC 873 TGGCCAGAAGCGGGG 2642 TGGCCA GAGGG 305 327CCCCCTCCCCCGCTTCT 874 GTGGCCAGAAGCGGG 2643 GGCCAC GGAGG 306 328CCCCTCCCCCGCTTCTG 875 TGTGGCCAGAAGCGG 2644 GCCACA GGGAG 107 329CCCTCCCCCGCTTCTGG 876 CTGTGGCCAGAAGCGG 2645 CCACAG GGGA 308 330CCTCCCCCGCTTCTGGC 877 GCTGTGGCCAGAAGCG 2646 CACAGC GGGG 311 333CCCCCGCTTCTGGCCAC 878 AGTGCTGTGGCCAGAA 2647 AGCACT GCGG 312 334CCCCGCTTCTGGCCACA 879 AAGTGCTGTGGCCAGA 2648 GCACTT AGCG 313 335CCCGCTTCTGGCCACAG 880 TAAGTGCTGTGGCCAG 2649 CACTTA AAGC 314 336CCGCTTCTGGCCACAGC 881 TTAAGTGCTGTGGCCA 2650 ACTTAA GAAG 324 346CCACAGCACTTAAACAC 882 AGAGATGTGTTTAAGT 2651 ATCTCT GCTG 348 370CCAAACCCCAAAAACA 883 GGTTCTTTGTTTTTGGG 2652 AAGAACC GTT 353 375CCCCAAAAACAAAGAA 884 GTTAGGGTTCTTTGTTT 2653 CCCTAAC TTG 354 376CCCAAAAACAAAGAAC 885 TGTTAGGGTTCTTTGTT 2654 CCTAACA TTT 355 377CCAAAAACAAAGAACC 886 GTGTTAGGGTTCTTTG 2655 CTAACAC TTTT 369 391CCCTAACACCAGCCTAA 887 ATCTGGTTAGGCTGGT 2656 CCAGAT GTTA 370 392CCTAACACCAGCCTAAC 888 AATCTGGTTAGGCTGG 2657 CAGATT TGTT 377 399CCAGCCTAACCAGATTT 889 AATTTGAAATCTGGTT 2658 CAAATT AGGC 381 403CCTAACCAGATTTCAAA 890 ATAAAATTTGAAATCT 2659 TTTTAT GGTT 386 408CCAGATTTCAAATTTTA 891 AAAAGATAAAATTTGA 2660 TCTTTT AATC 433 455CCCCCCAACTAACACAT 892 AAAATAATGTGTTAGT 2661 TATTTT TGGG 434 456CCCCCAACTAACACATT 893 GAAAATAATGTGTTAG 2662 ATTTTC TTGG 435 457CCCCAACTAACACATTA 894 GGAAAATAATGTGTTA 2663 TTTTCC GTTG 436 458CCCAACTAACACATTAT 895 GGGAAAATAATGTGTT 2664 TTTCCC AGTT 437 459CCAACTAACACATTATT 896 GGGGAAAATAATGTGT 7665 TTCCCC TAGT 456 478CCCCTCCCACTCCCATA 897 TAGTAGTATGGGAGTG 2666 CTACTA GGAG 457 479CCCTCCCACTCCCATAC 898 TTAGTAGTATGGGAGT 2667 TACTAA GGGA 458 480CCTCCCACTCCCATACT 899 ATTAGTAGTATGGGAG 2668 ACTAAT TGGG 461 483CCCACTCCCATACTACT 900 GAGATTAGTAGTATGG 2669 AATCTC GAGT 462 484CCACTCCCATACTACTA 901 TGAGATTAGTAGTATG 2670 ATCTCA GGAG 467 489CCCATACTACTAATCTC 902 ATTGATGAGATTAGTA 2671 ATCAAT GTAT 468 490CCATACTACTAATCTCA 903 TATTGATGAGATTAGT 2672 TCAATA AGTA 494 516CCCCCGCCCATCCTACC 904 GTGCTGGGTAGGATGG 2673 CAGCAC GCGG 495 517CCCCGCCCATCCTACCC 905 TGTGCTGGGTAGGATG 2674 AGCACA GGCG 496 518CCCGCCCATCCTACCCA 906 GTGTGCTGGGTAGGAT 2675 GCACAC GGGC 497 519CCGCCCATCCTACCCAG 907 TGTGTGCTGGGTAGGA 2676 CACACA TGGG 500 522CCCATCCTACCCAGCAC 908 GTGTGTGTGCTGGGTA 2677 ACACAC GGAT 501 523CCATCCTACCCAGCACA 909 TGTGTGTGTGCTGGTA 2678 CACACA AGGA 505 527CCTACCCAGCACACACA 910 GCGGTGTGTGTGTGCT 2679 CACCGC GGGT 509 531CCCAGCACACACACACC 911 AGCAGCGGTGTGTGTG 2680 GCTGCT TGCT 510 532CCAGCACACACACACCG 912 TAGCAGCGGTGTGTGT 2681 CTGCTA GTGC 524 546CCGCTGCTAACCCCATA 913 TCGGGGTATGGGGTTA 2682 CCCCGA GCAG 534 556CCCCATACCCCGAACCA 914 TTTGGTTGGTTCGGGG 2683 ACCAAA TATG 535 557CCCATACCCCGAACCAA 915 GTTTGGTTGGTTCGGG 2684 CCAAAC GTAT 536 558CCATACCCCGAACCAAC 916 GGTTTGGTTGGTTCGG 2685 CAAACC GGTA 541 563CCCCGAACCAACCAAAC 917 TTTGGGGTTTGGTTGG 2686 CCCAAA TTCG 542 564CCCGAACCAACCAAACC 918 CTTTGGGGTTTGGTTG 2687 CCAAAG GTTC 543 565CCGAACCAACCAAACCC 919 TCTTTGGGGTTTGGTT 2688 CAAAGA GGTT 548 570CCAACCAAACCCCAAA 920 GGGTGTCTTTGGGGTT 2689 GACACCC TGGT 552 574CCAAACCCCAAAGACA 921 TGGGGGGTGTCTTTGG 2690 CCCCCCA GGTT 557 579CCCCAAAGACACCCCCC 922 AACTGTGGGGGGTGTC 2691 ACAGTT TTTG 558 580CCCAAAGACACCCCCCA 923 AAACTGTGGGGGGTGT 2692 CAGTTT CTTT 559 581CCAAAGACACCCCCCAC 924 TAAACTGTGGGGGGTG 2693 AGTTTA TCTT 568 590CCCCCCACAGTTTATGT 925 TAAGCTACATAAACTG 2694 AGCTTA TGGG 569 591CCCCCACAGTTTATGTA 926 GTAAGCTACATAAACT 2695 GCTTAC GTGG 570 592CCCCACAGTTTATGTAG 927 GGTAAGCTACATAAAC 2696 CTTACC TGTG 571 593CCCACAGTTTATGTAGC 928 AGGTAAGCTACATAAA 2697 TTACCT CTGT 572 594CCACAGTTTATGTAGCT 929 GAGGTAAGCTACATAA 2698 TACCTC ACTG 591 613CCTCCTCAAAGCAATAC 930 TTCAGTGTATTGCTTT 2699 ACTGAA GAGG 594 616CCTCAAAGCAATACACT 931 ATTTTCAGTGTATTGC 2700 GAAAAT TTTG 637 659CCCCATAAACAAATAGG 932 ACCAAACCTATTTGTT 2701 TTTGGT TATG 638 660CCCATAAACAAATAGGT 933 GACCAAACCTATTTGT 2702 TTGGTC TTAT 639 661CCATAAACAAATAGGTT 934 GGACCAAACCTATTTG 2703 TGGTCC TTTA 660 682CCTAGCCTTTCTATTAG 935 TAAGAGCTAATAGAA 2704 CTCTTA AGGCT 665 687CCTTTCTATTAGCTCTTA 936 CTTACTAAGAGCTAAT 2705 GTAAG AGAA 705 727CCCCGTTCCAGTGAGTT 937 AGGGTGAACTCACTGG 2706 CACCCT AACG 706 728CCCGTTCCAGTGAGTTC 938 GAGGGTGAACTCACTG 2707 ACCCTC GAAC 707 729CCGTTCCAGTGAGTTCA 939 AGAGGGTGAACTCACT 2708 CCCTCT GGAA 712 734CCAGTGAGTTCACCCTC 940 GATTTAGAGGGTGAAC 2709 TAAATC TCAC 724 746CCCTCTAAATCACCACG 941 TTTGATCGTGGTGATT 2710 ATCAAA TAGA 725 747CCTCTAAATCACCACGA 942 TTTTGATCGTGGTGAT 2711 TCAAAA TTAG 736 758CCACGATCAAAAGGAA 943 ATGCTTGTTCCTTTTGA 2712 CAAGCAT TCG 792 814CCTAGCCACACCCCCAC 944 TTTCCCGTGGGGGTGT 2713 GGGAAA GGCT 797 819CCACACCCCCACGGGAA 945 TGCTGTTTCCCGTGGG 2714 ACAGCA GGTG 802 824CCCCCACGGGAAACAG 946 ATCACTGCTGTTTCCC 2715 CAGTGAT GTGG 803 825CCCCACGGGAAACAGC 947 AATCACTGCTGTTTCC 2716 AGTGATT CGTG 804 826CCCACGGGAAACAGCA 948 TAATCACTGCTGTTTC 2717 GTGATTA CCGT 805 827CCACGGGAAACAGCAG 949 TTAATCACTGCTGTTT 2718 TGATTAA CCCG 828 850CCTTTAGCAATAAACGA 950 AAACTTTCGTTTATTG 2719 AAGTTT CTAA 867 889CCCCAGGGTTGGTCAAT 951 CACGAAATTGACCAAC 2720 TTCGTG CCTG 868 890CCCAGGGTTGGTCAATT 952 GCACGAAATTGACCAA 2721 TCGTGC CCCT 869 891CCAGGGTTGGTCAATTT 953 GGCACGAAATTGACCA 2722 CGTGCC ACCC 890 912CCAGCCACCGCGGTCAC 954 AATCGTGTGACCGCGG 2723 ACGATT TGGC 894 916CCACCGCGGTCACACGA 955 GGTTAATCGTGTGACC 2724 TTAACC GCGG 897 919CCGCGGTCACACGATTA 956 TTGGGTTAATCGTGTG 2725 ACCCAA ACCG 915 937CCCAAGTCAATAGAAGC 957 ACGCCGGCTTCTATTG 2726 CGGCGT ACTT 916 938CCAAGTCAATAGAAGCC 958 TACGCCGGCTTCTATT 2727 GGCGTA GACT 931 953CCGGCGTAAAGAGTGTT 959 ATCTAAAACACTCTTT 2728 TTAGAT ACGC 956 978CCCCCTCCCCAATAAAG 960 TTTTAGCTTTTTGGG 2729 CTAAAA GAGG 957 979CCCCTCCCCAATAAAGC 961 GTTTTAGCTTTATTGG 2730 TAAAAC GGAG 958 980CCCTCCCCAATAAAGCT 962 AGTTTTAGCTTTATTG 2731 AAAACT GGGA 959 981CCTCCCCAATAAAGCTA 963 GAGTTTTAGCTTTATT 2732 AAACTC GGGG 962 984CCCCAATAAAGCTAAAA 964 GGTGAGTTTTAGCTTT 2733 CTCACC ATTG 963 985CCCAATAAAGCTAAAAC 965 AGGTGAGTTTTAGCTT 2734 TCACCT TATT 964 986CCAATAAAGCTAAAACT 966 CAGGTGAGTTTTAGCT 2735 CACCTG TTAT 983 1005CCTGAGTTGTAAAAAAC 967 ACTGGAGTTTTTTACA 2736 TCCAGT ACTC 1001 1023CCAGTTGACACAAAATA 968 GTAGTCTATTTTGTGT 2737 GACTAC CAAC 1064 1086CCCAAACTGGGATTAGA 969 GGGGTATCTAATCCCA 2738 TACCCC GTTT 1065 1087CCAAACTGGGATTAGAT 970 TGGGGTATCTAATCCC 2739 ACCCCA AGTT 1083 1105CCCCACTATGCTTAGCC 971 GTTTAGGGCTAAGCAT 2740 CTAAAC AGTG 1084 1106CCCACTATGCTTAGCCC 972 GGTTTAGGGCTAAGCA 2741 TAAACC TAGT 1085 1107CCACTATGCTTAGCCCT 973 AGGTTTAGGGCTAAGC 2742 AAACCT ATAG 1098 1120CCCTAAACCTCAACAGT 974 GATTTAACTGTTGAGG 2743 TAAATC TTTA 1099 1121CCTAAACCTCAACAGTT 975 TGATTTAACTGTTGAG  2744 AAATCA GTTT 1105 1127CCTCAACAGTTAAATCA 976 TTTTGTTGATTTAACTG 2745 ACAAAA TTG 1135 1157CCAGAACACTACGAGCC 977 AGCTGTGGCTCGTAGT 2746 ACAGCT GTTC 1150 1172CCACAGCTTAAAACTCA 978 GTCCTTTGAGTTTTAA 2747 AAGGAC GCTG 1172 1194CCTGGCGGTGCTTCATA 979 GAGGGATATGAAGCA 2748 TCCCTC CCGCC 1190 1212CCCTCTAGAGGAGCCTG 980 ACAGAACAGGCTCCT 2749 TTCTGT TAGA 1191 1213CCTCTAGAGGAGCCTGT 981 TACAGAACAGGCTCCT 2750 TCTGTA CTAG 1203 1225CCTGTTCTGTAATCGAT 982 GGGTTTATCGATTACA 2751 AAACCC GAAC 1223 1245CCCCGATCAACCTCAC 983 AGAGGTGGTGAGGTTG 2752 ACCTCT ATCG 1224 1246CCCGATCAACCTCACCA 984 AAGAGGTGGTGAGGTT 2753 CCTCTT GATC 1225 1247CCGATCAACCTCACCAC 985 CAAGAGGTGGTGAGG 2754 CTCTTG TTGAT 1233 1255CCTCACCACCTCTTGCT 986 AGGCTGAGCAAGAGG 2755 CAGCCT TGGTG 1238 1260CCACCTCTTGCTCAGCC 987 TATATAGGCTGAGCAA 2756 TATATA GAGG 1241 1263CCTCTTGCTCAGCCTAT 988 CGGTATATAGGCTGAG 2757 ATACCG CAAG 1253 1275CCTATATACCGCCATCT 989 TGCTGAAGATGGCGGT 2758 TCAGCA ATAT 1261 1283CCGCCATCTTCAGCAAA 990 TCAGGGTTTGCTGAAG 2759 CCCTGA ATGG 1264 1286CCATCTTCAGCAAACCC 991 TCATCAGGGTTTGCTG 2760 TGATGA AAGA 1278 1300CCCTGATGAAGGCTACA 992 TTACTTTGTAGCCTTC 2761 AAGTAA ATCA 1279 1301CCTGATGAAGGCTACAA 993 CTTACTTTGTAGCCTTC 2762 AGTAAG ATC 1310 1332CCCACGTAAAGACGTTA 994 TTGACCTAACGTCTTT 2763 GGTCAA ACGT 1311 1333CCACGTAAAGACGTTAG 995 CTTGACCTAACGTCTT 2764 GTCAAG TACG 1340 1362CCCATGAGGTGGCAAG 996 CCCATTTGTTGCCACC 2765 AAATGGG TCAT 1341 1363CCATGAGGTGGCAAGA 997 GCCCATTTCTTGCCAC 2766 AATGGGC CTCA 1375 1397CCCCAGAAAACTACGAT 998 AGGGCTATCGTAGTTT 2767 AGCCCT TCTG 1376 1398CCCAGAAAACTACGATA 999 AAGGGCTATCGTAGTT 2768 GCCCTT TTCT 1377 1399CCAGAAAACTACGATA 1000 TAAGGGCTATCGTAGT 2769 GCCCTTA TTTC 1394 1416CCCTTATGAAACTTAAG 1001 TCGACCCTTAAGTTTC 2770 GGTCGA ATAA 1395 1417CCTTATGAAACTTAAGG 1002 TTCGACCCTTAAGTTT 2771 GTCGAA CATA 1465 1487CCCTGAAGCGCGTACAC 1003 GGCGGTGTGTACGCGC 2772 ACCGCC TTCA 1466 1488CCTGAAGCGCGTACACA 1004 GGGCGGTGTGTACGCG 2773 CCGCCC CTTC 1483 1505CCGCCCTCACCCTCCT 1005 TACTTGAGGAGGGTGA 2774 CAAGTA CGGG 1486 1508CCCGTCACCCTCCTCAA 1006 GTATACTTGAGGAGGG 2775 GTATAC TGAC 1487 1509CCGTCACCCTCCTCAAG 1007 AGTATACTTGAGGAGG 7776 TATACT GTGA 1493 1515CCCTCCTCAAGTATACT 1008 CTTTGAAGTATACTTG 2777 TCAAAG AGGA 1494 1516CCTCCTCAAGTATACTT 1009 CCTTTGAAGTATACTT 2778 CAAAGG GAGG 1497 1519CCTCAAGTATACTTCAA 1010 TGTCCTTTGAAGTATA 2779 AGGACA CTTG 1531 1553CCCCTACGCATTTATAT 1011 TCCTCTATATAAATGC 2780 AGAGGA GTAG 1532 1554CCCTACGCATTTATATA 1012 CTCCTCTATATAAATG 2781 GAGGAG CGTA 1533 1555CCTACGCATTTATATAG 1013 TCTCCTCTATATAAAT 2782 AGGAGA GCGT 1601 1623CCAGAGTGTAGCTTAAC 1014 CTTTGTGTTAAGCTAC 2783 ACAAAG ACTC 1626 1648CCCAACTTACACTTAGG 1015 AAATCTCCTAAGTGTA 2784 AGATTT AGTTT 162 1649CCAACTTACACTTAGGA 1016 GAAATCTCCTAAGTGT 2785 GATTTC AAGT 1662 1684CCGCTCTGAGCTAAACC 1017 GGGCTAGGTTTAGCTC 2786 TAGCCC AGAG 1677 1699CCTAGCCCCAAACCCAC 1018 GGTGGAGTGGGTTTGG 2787 TCCACC GGCT 1682 1704CCCCAAACCCACTCCAC 1019 AGTAAGGTGGAGTGG 2788 CTTACT GTTTG 1683 1705CCCAAACCCACTCCACC 1020 TAGTAAGGTGGAGTGG 2789 TTACTA GTTT 1684 1706CCAAACCCACTCCACCT 1021 GTAGTAAGGTGGAGTG 2790 TACTAC GGTT 1689 1711CCCACTCCACCTTACTA 1022 GTCTGGTAGTAAGGTG 2791 CCAGAC GAGT 1690 1712CCACTCCACCTTACTAC 1023 TGTCTGGTAGTAAGGT 2792 CAGACA GGAG 1695 1717CCACCTTACTACCAGAC 1024 AAGGTTGTCTGGTAGT 2793 AACCTT AAGG 1698 1720CCTTACTACCAGACAAC 1025 GCTAAGGTTGTCTGGT 2794 CTTAGC AGTA 1706 1728CCAGACAACCTTAGCC 1026 ATGGTTTGGCTAAGGT 2795 AACCAT TGTC 1714 1736CCTTAGCCAAACCATTT 1027 TTGGGTAAATGGTTTG 2796 ACCCAA GCTA 1720 1742CCAAACCATTTACCCAA 1028 CTTTATTTGGGTAAAT 2797 ATAAAG GGTT 1725 1747CCATTTACCCAAATAAA 1029 CTATACTTTATTTGGG 2798 GTATAG TAAA 1732 1754CCCAAATAAAGTATAGG 1030 CTATCGCCTATACTTT 2799 CGATAG ATTT 1733 1755CCAAATAAAGTATAGGC 1031 TCTATCGCCTATACTTT 2800 GATAGA ATT 1764 1786CCTGGCGCAATAGATAT 1032 GGTACTATATCTATTG 2801 AGTACC CGCC 1785 1807CCGCAAGGGAAAGATG 1033 AATTTTTCATCTTTCCC 2802 AAAAATT TTG 1812 1834CCAAGCATAATATAGCA 1034 AGTCCTTGCTATATTA 2803 AGGACT TGCT 1837 1859CCCCTATACCTTCTGCA 1035 TCATTATGCAGAAGGT 2804 TAATGA ATAG 1838 1860CCCTATACCTTCTGCAT 1036 TTCATTATGCAGAAGG 2805 AATGAA TATA 1839 1861CCTATACCTTCTGCATA 1037 ATTCATTATGCAGAAG 2806 ATGAAT GTAT 1845 1867CCTTCTGCATAATGAAT 1038 TAGTTAATTCATTATG 2807 TAACTA CAGA 1889 1911CCAAAGCTAAGACCCCC 1039 GGTTTCGGGGGTCTTA 2808 GAAACC GCTT 1901 1923CCCCCGAAACCAGACG 1040 GGTAGCTCGTCTGGTT 2809 AGCTACC TCGG 1902 1924CCCCGAAACCAGACGA 1041 AGGTAGCTCGTCTGGT 2810 GCTAC TTCG 1903 1925CCCGAAACCAGACGAG 1042 TAGGTAGCTCGTCTGG 2811 CTACCTA TTTC 1904 1926CCGAAACCAGACGAGC 1043 TTAGGTAGCTCGTCTG 2812 TACCTAA GTTT 1910 1932CCAGACGAGCTACCTAA 1044 CTGTTCTTTTGGTAGCT 2813 GAACAG CGTC 1922 1944CCTAAGAACAGCTAAA 1045 GTGCTCTTTTAGCTGTT 2814 AGAGCAC CTT 1946 1968CCCGTCTATGTAGCAAA 1046 CACTATTTTGCTACAT 2815 ATAGTG AGAC 1947 1969CCGTCTATGTAGCAAAA 1047 CCACTATTTTGCTACA 2816 TAGTGG TAGA 1996 2018CCTACCGAGCCTGGTGA 1048 CAGCTATCACCAGGCT 2817 TAGCTG CGGT 2000 2022CCGAGCCTGGTGATAGC 1049 CAACCAGCTATCACCA 2818 TGGTTG GGCT 2005 2027CCTGGTGATAGCTGGTT 1050 TTGGACAACCAGCTAT 2819 GTCCAA CACC 2024 2046CCAAGATAGAATCTTAG 1051 GTTGAACTAAGATTCT 2820 TTCAAC ATCT 2057 2079CCCACAGAACCCTCTAA 1052 GGGGATTTAGAGGGTT 2821 ATCCCC CTGT 2058 2080CCACAGAACCCTCTAAA 1053 AGGGGATTTAGAGGGT 2822 TCCCCT TCTG 2066 7088CCCTCTAAATCCCCTTG 1054 AATTTACAAGGGGATT 2823 TAAATT TAGA 2067 2089CCTCTAAATCCCCTTGT 1055 AAATTTACAAGGGGAT 2824 AAATTT TTAG 2076 2098CCCCTTGTAAATTTAAC 1056 CTAACAGTTAAATTTA 2825 TGTTAG CAAG 2077 2099CCCTTGTAAATTTAACT 1057 ACTAACAGTTAAATTT 2826 GTTAGT ACAA 2078 2100CCTTGTAAATTTAACTG 1058 GACTAACAGTTAAATT 2827 TTAGTC TACA 2100 2122CCAAAGAGGAACAGCT 1059 TCCAAAGAGCTGTTCC 2828 CTTTGGA TCTT 2136 2158CCTTGTAGAGAGAGTAA 1060 AATTTTTTACTCTCTCT 2829 AAAATT ACA 2164 2186CCCATAGTAGGCCTAAA 1061 GCTGCTTTTAGGCCTA 2830 AGCAGC CTAT 2165 2187CCATAGTAGGCCTAAAA 1062 GGCTGCTTTTAGGCCT 2831 GCAGCC ACTA 2175 2197CCTAAAAGCAGCCACCA 1063 CTTAATTGGTGGCTGC 2832 ATTAAG TTTT 2186 2208CCACCAATTAAGAAAGC 1064 TTGAACGCTTTCTTAA 2833 GTTCAA TTGG 2189 2211CCAATTAAGAAAGCGTT 1065 AGCTTGAACGCTTTCT 2834 CAAGCT AAT 2217 2239CCCACTACCTAAAAAAT 1066 TTTGGGATTTTTTAGG 2835 CCCAAA TAGT 2218 2240CCACTACCTAAAAAATC 1067 GTTTGGGATTTTTTAG 2836 CCAAAC GTAG 2224 2246CCTAAAAAATCCCAAAC 1068 TTATATGTTTGGGATT 2837 ATATAA TTTT 2234 2256CCCAAACATATAACTGA 1069 AGGAGTTCAGTTATAT 2838 ACTCCT GTTT 2235 2257CCAAACATATAACTGAA 1070 GAGGAGTTCAGTTATA 2839 CTCCTC TGTT 2254 2276CCTCACACCCAATTGGA 1071 GATTGGTCCAATTGGG 2840 CCAATC TGTG 2261 2283CCCAATTGGACCAATCT 1072 GGTGATAGATTGGTCC 2841 ATCACC AATT 2262 2284CCAATTGGACCAATCTA 1073 GGGTGATAGATTGGTC 2842 TCACCC CAAT 2271 2293CCAATCTATCACCCTAT 1074 TCTTCTATAGGGTGAT 2843 AGAAGA AGAT 2282 2304CCCTATAGAAGAACTAA 1075 CTAACATTAGTTCTTC 2844 TGTTAG TATA 2283 2305CCTATAGAAGAACTAAT 1076 ACTAACATTAGTTCTT 7845 GTTAGT CTAT 2328 2350CCTCCGCATAAGCCTGC 1077 TCTGACGCAGGCTTAT 2846 GTCAGA GCGG 2331 2353CCGCATAAGCCTGCGTC 1078 TAATCTGACGCAGGCT 2847 AGATTA TATG 2340 2362CCTGCGTCAGATTAAAA 1079 TCAGTGTTTTAATCTG 2848 CACTGA ACGC 2378 2400CCCAATATCTACAATCA 1080 GTTGGTTGATTGTAGA 2849 ACCAAC TATT 2379 2401CCAATATCTACAATCAA 1081 TGTTGGTTGATTGTAG 2850 CCAACA ATAT 2396 2418CCAACAAGTCATTATTA 1082 TGAGGGTAATAATGAC 2851 CCCTCA TTGT 2413 2435CCCTCACTGTCAACCCA 1083 CTGTGTTGGGTTGACA 2852 ACACAG GTGA 2414 2436CCTCACTGTCAACCCAA 1084 CCTGTGTTGGGTTGAC 2853 CACAGG AGTG 2426 2448CCCAACACGGCATGCT 1085 CTTATGAGCATGCCTG 2854 CATAAG TGTT 2427 2449CCAACACAGGCATGCTC 1086 CCTTATGAGCATGCCT 2855 ATAAGG GTGT 2488 2510CCCCGCCTGTTTACCAA 1087 ATGTTTTTGGTAAACA 2856 AAACAT GGCG 2489 2511CCCGCCTGTTTACCAAA 1088 GATGTTTTTGGTAAAC 2857 AACATC AGGC 2490 2512CCGCCTGTTTACCAAAA 1089 TGATGTTTTTGGTAAA 2858 ACATCA CAGG 2493 2515CCTGTTTACCAAAAACA 1090 AGGTGATGTTTTTGGT 2859 TCACCT AAAC 2501 2523CCAAAAACATCACCTCT 1091 GATGCTAGAGGTGATG 2860 AGCATC TTTT 2513 2535CCTCTAGCATCACCAGT 1092 TCTAATACTGGTGATG 2861 ATTAGA CTAG 2525 2547CCAGTATTAGAGGCACC 1093 GCAGGCGGTGCCTCTA 2862 GCCTGC ATAC 2540 2562CCGCCTGCCCAGTGACA 1094 AACATGTGTCACTGGG 2863 CATGTT CAGG 2543 2565CCTGCCCAGTGACACAT 1095 TTAAACATGTGTCACT 2864 GTTTAA GGGC 2547 2569CCCAGTGACACATGTTT 1096 GCCGTTAAACATGTGT 2865 AACGGC CACT 2548 2570CCAGTGACACATGTTTA 1097 GGCCGTTAAACATGTG 2866 ACGGCC TCAC 2569 2591CCGCGGTACCCTAACCG 1098 TTTGCACGGTTAGGGT 2867 TGCAAA ACCG 2577 2599CCCTAACCGTGCAAAGG 1099 ATGCTACCTTTGCACG 2868 TAGCAT GTTA 2578 2600CCTAACCGTGCAAAGGT 1100 TATGCTACCTTTGCAC 2869 AGCATA GGTT 2583 2605CCGTGCAAAGGTAGCAT 1101 GTGATTATGTACCT 2870 AATCAC TGCA 2611 2633CCTTAAATAGGGACCTG 1101 TTCATACAGGTCCCTA 2871 TATGAA TTTA 2624 2646CCTGTATGAATGGCTCC 1103 CCTCGTGGAGCCATTC 2872 ACGAGG ATAC 2639 2661CCACGAGGGTTCAGCTG 1104 AAGAGACAGCTGAAC 2873 TCTCTT CCTCG 2670 2692CCAGTGAAATTGACCTG 1105 CACGGGCAGGTCAATT 2874 CCCGTG TCAC 2683 2705CCTGCCCGTGAAGAGGC 1106 ATGCCCGCCTCTTCAC 7875 GGGCAT GGGC 2687 2709CCCGTGAAGAGGCGGG 1107 TGTTATGCCCGCCTCT 2876 CATAACA TCAC 2688 2710CCGTGAAGAGGCGGGC 1108 GTGTTATGCCCGCCTC 2877 ATAACAC TTCA 2726 2748CCCTATGGAGCTTTAAT 1109 TAATAAATTAAAGC  2878 TTATTA CATA 2727 2749CCTATGGAGCTTTAATT 1110 TTAATAAATTAAAGCT 2879 TATTAA CCAT 2761 2783CCTAACAAACCCACAGG 1111 TTAGGACCTGTGGGTT 2880 TCCTAA TGTT 2770 2792CCCACAGGTCCTAAACT 1112 TTTGGTAGTTTAGGAC 2881 ACCAAA CTGT 2771 2793CCACAGGTCCTAAACTA 1113 GTTTGGTAGTTTAGGA 2882 CCAAAC CCTG 7779 2801CCTAAACTACCAAACCT 1114 TAATGCAGGTTTGGTA 2883 GCATTA GTTT 2788 2810CCAAACCTGCATTAAAA 1115 CGAAATTTTTAATGCA 2884 ATTTCG GGTT 2793 2815CCTGCATTAAAAATTTC 1116 CCAACCGAAATTTTTA 2885 GGTTGG ATGC 2821 2843CCTCGGAGCAGAACCCA 1117 GGAGGTTGGGTTCTGC 2886 ACCTCC TCCG 2834 2856CCCAACCTCCGAGCAGT 1118 GCATGTACTGCTCGGA 2887 ACATGC GGTT 2835 2857CCAACCTCCGAGCAGTA 1119 AGCATGTACTGCTCGG 2888 CATGCT AGGT 2839 2861CCTCCGAGCAGTACATG 1120 TCTTAGCATGTACTGC 2889 CTAAGA TCGG 2842 2864CCGAGCAGTACATGCTA 1121 AAGTCTTAGCATGTAC 2890 AGACTT TGCT 2867 2889CCAGTCAAAGCGAACTA 1122 GTATACTTAGTTCGCTT 2891 CTATAC TGAC 2899 2921CCAATAACTTGACCAAC 1123 TGTTCCGTTGGTCAACG 2892 GGAACA TTAT 2911 2933CCAACGGAACAAGTTAC 1124 CCTAGGGTAACTTGTT 2893 CCTAGG CCGT 2927 2949CCCTAGGGATAACAGCG 1125 GGATTGCGCTGTTATC 2894 CAATCC CCTA 2928 2950CCTAGGGATAACAGCGC 1126 AGGATTGCGCTGTTAT 2895 AATCCT CCCT 2948 2970CCTATTCTAGAGTCCAT 1127 GTTGATATGGACTCTA 2896 ATCAAC GAAT 2961 2983CCATATCAACAATAGGG 1128 CGTAAACCCTATTGTT 2897 TTTACG GATA 2985 3007CCTCGATGTTGGATCAG 1129 GATGTCCTGATCCAAC 2898 GACATC ATCG 3007 3029CCCGATGGTGCAGCCGC 1130 TTAATAGCGGCTGCAC 2899 TATTAA CATC 3008 3030CCGATGGTGCAGCCGCT 1131 TTTAATAGCGGCTGCA 2900 ATTAAA CCAT 3020 3042CCGCTATTAAAGGTTCG 1132 AACAAACGAACCTTTA 2901 TTTGTT ATAG 3056 3078CCTACGTGATCTGAGTT 1133 GGTCTGAACTCAGATC 2902 CAGACC ACGT 3077 3099CCGGAGTAATCCAGGTC 1134 GAAACCGACCTGGATT 2903 GGTTTC ACTC 3087 3109CCAGGTCGGTTTCTATC 1135 AANGTAGATAGAAAC 2904 TACNTT CGACC 3116 3138CCTCCCTGTACGAAAGG 1136 TCTTGTCCTTTCGTACA 2905 ACAAGA GGG 3119 3141CCCTGTACGAAAGGACA 1137 TTCTCTTGTCCTTTCGT 2906 AGAGAA ACA 3120 3142CCTGTACGAAAGGACA 1138 TTTCTCTTGTCCTTTCG 2907 AGAGAAA TAC 3148 3170CCTACTTCACAAAGCGC 1139 GGGAAGGCGCTTTGTG 2908 CTTCCC AAGT 3164 3186CCTTCCCCCGTAAATGA 1140 ATGATATCATTTACGG 2909 TATCAT GGGA 3168 3190CCCCCGTAAATGATATC 1141 TGAGATGATATCATTT 2910 ATCTCA ACGG 3169 3191CCCCGTAAATGATATCA 1142 TTGAGATGATATCATT 2911 TCTCAA TACG 3170 3192CCCGTAAATGATATCAT 1143 GTTGAGATGATATCAT 2912 CTCAAC TTAC 3171 3193CCGTAAATGATATCATC 1144 AGTTGAGATGATATCA 2913 TCAACT TTTA 3204 3226CCCACACCCACCCAAGA 1145 CCCTGTTCTTGGGTGG 2914 ACAGGG GTGT 3205 3227CCACACCCACCCAAGAA 1146 ACCCTGTTCTTGGGTG 2915 CAGGGT GGTG 3210 3232CCCACCCAAGAACAGG 1147 AACAAACCCTGTTCTT 2916 GTTTGTT GGGT 3211 3233CCACCCAAGAACAGGG 1148 TAACAAACCCTGTTCT 2917 TTTGTTA TGGG 3214 3236CCCAAGAACAGGGTTTG 1149 TCTTAACAAACCCTGT 2918 TTAAGA TCTT 3215 3237CCAAGAACAGGGTTTGT 1150 ATCTTAACAAACCCTG 2919 TAAGAT TTCT 3245 3267CCCGGTAATCGCATAAA 1151 TTAAGTTTTATGCGAT 2920 ACTTAA TACC 3246 3268CCGGTAATCGCATTAAAA 1152 TTTAAGTTTTATGCGA 2921 CTTAAA TTAC 3292 3314CCTCTTCTTAACAACAT 1153 ATGGGTATGTTGTTAA 2922 ACCCAT GAAG 3310 3332CCCATGGCCAACCTCCT 1154 AGGAGTAGGAGGTTG 2923 ACTCCT GCCAT 3311 3333CCATGGCCAACCTCCTA 1155 GAGGAGTAGGAGGTT 2924 CTCCTC GGCCA 3317 3339CCAACCTCCTACTCCTC 1156 TACAATGAGGAGTAG 2925 ATTGTA GAGGT 3321 3343CCTCCTACTCCTATTGT 1157 TGGGTACAATGAGGA 2926 ACCCA GTAGG 3324 3346CCTACTCCTCATTGTAC 1158 GAATGGGTACAATGA 2927 CCATTC GGAGT 3330 3352CCTCATTGTACCCATTC 1159 CGATTAGAATGGGTAC 2928 TAATCG AATG 3340 3362CCCATTCTAATCGCAAT 1160 AATGCCATTGCGATTA 2929 GGCATT GAAT 3341 3363CCATTCTAATCGCAATG 1161 GAATGCCATTGCGATT 2930 GCATTC AGAA 3363 3385CCTAATGCTTACCGAAC 1162 TTTTTCGTTCGGTAAG 2931 GAAAAA CATT 3374 3396CCGAACGAAAAATTCTA 1163 ATAGCCTAGAATTTTT 2932 GGCTAT CGTT 3414 3436CCCCAACGTTGTAGGCC 1164 CGTAGGGGCCTACAAC 2933 CCTACG GTTG 3415 3437CCCAACGTTGTAGGCCC 1165 CCGTAGGGGCCTACAA 2934 CTACGG CGTT 3416 3438CCAACGTTGTAGGCCCC 1166 CCCGTAGGGGCCTACA 2935 TACGGG ACGT 3429 3451CCCCTACGGGCTACTAC 1167 AGGTTGTAGTAGCCC 2936 AACCCT GTAG 3430 3452CCCTACGGGCTACTACA 1168 AAGGGTTGTAGTAGCC 2937 ACCCTT CGTA 3431 3453CCTACGGGCTACTACAA 1169 GAAGGGTTGTAGTAGC 2938 CCCTTC CCGT 3448 3470CCCTTCGCTGACGCCAT 1170 AGTTTTATGGCGTCAG 2939 AAAACT CGAA 3449 3471CCTTCGCTGACGCCATA 1171 GAGTTTTATGGCGTCA 2940 AAACTC GCGA 3461 3483CCATAAAACTCTTCACC 1172 CTCTTTGGTGAAGAGT 2941 AAAGAG TTTA 3476 3498CCAAAGAGCCCCTAAA 1173 GGCGGGTTTTAGGGGC 2942 ACCCGCC TCTT 3484 3506CCCCTAAAACCCGCCAC 1174 GTAGATGTGGCGGGTT 2943 ATCTAC TTAG 3485 3507CCCTAAAACCCGCCACA 1175 GGTAGATGTGGCGGGT 2944 TCTACC TTTA 3486 3508CCTAAAACCCGCCACAT 1176 TGGTAGATGTGGCGGG 2945 CTACCA TTTT 3493 3515CCCGCCACATCTACCAT 1177 AGGGTGATGGTAGATG 2946 CACCCT TGGC 3494 3516CCGCCACATCTACCATC 1178 GAGGGTGATGGTAGAT 2947 ACCCTC GTGG 3497 3519CCACATCTACCATCACC 1179 GTAGAGGGTGATGGTA 2948 CTCTAC GATG 3506 3528CCATCACCCTCTACATC 1180 GGCGGTGATGTAGAG 2949 ACCGCC GGTGA 3512 3534CCCTCTACATCACCGCC 1181 GGTCGGGGCGGTGATG 2950 CCGACC TAGA 3513 3535CCTCTACATCACCGCCC 1182 AGGTCGGGGCGGTGAT 2951 CGACCT GTAG 3524 3546CCGCCCCGACCTTAGCT 1183 GGTGAGAGCTAAGGTC 2952 CTCACC GGGG 3527 3549CCCCGACCTTAGCTCTC 1184 GATGGTGAGAGCTAA 2953 ACCATC GGTCG 3528 3550CCCGACCTTAGCTCTCA 1185 CGATGGTGAGAGCTAA 2954 CCATCG GGTC 3529 3551CCGACCTTAGCTCTCAC 1186 GCGATGGTGAGAGCTA 2955 CATCGC AGGT 3533 3555CCTTAGCTCTCACCATC 1187 AAGAGCGATGGTGAG 2956 GCTCTT AGCTA 3545 3567CCATCGCTCTTCTACTA 1188 GGTTCATAGTAGAAGA 2957 TGAACC GCGA 3566 3588CCCCCCTCCCCATACCC 1189 GGGGTTGGGTATGGGG 2958 AACCCC AGGG 3567 3589CCCCCTCCCCATACCA 1190 GGGGGTTGGGTATGGG 2959 ACCCCC GAGG 3568 3590CCCCTCCCCATACCCAA 1191 AGGGGGTTGGGTATGG 2960 CCCCCT GGAG 3569 3591CCCTCCCCATACCCAAC 1192 CAGGGGGTTGGGTATG 2961 CCCCTG GGGA 3570 3592CCTCCCCATACCCAACC 1193 CCAGGGGGTTGGGTAT 2962 CCCTGG GGGG 3573 3595CCCCATACCCAACCCCC 1194 TGACCAGGGGGTTGGG 2963 TGGTCA TATG 3574 3596CCCATACCCAACCCCCT 1195 TTGACCAGGGGGTTGG 2964 GGTCAA GTAT 3575 3597CCATACCCAACCCCCTG 1196 GTTGACCAGGGGGTTG 2965 GTCAAC GGTA 3580 3602CCCAACCCCCTGGTCAA 1197 TTGAGGTTGACCAGGG 2966 CCTCAA GGTT 3581 3603CCAACCCCCTGGTCAAC 1198 GTTGAGGTTGACCAGG 2967 CTCAAC GGGT 3585 3607CCCCCTGGTCAACCTCA 1199 CTAGGTTGAGGTTGAC 2968 ACCTAG CAGG 3586 3608CCCCTGGTCAACCTCAA 1200 CCTAGGTTGAGGTTGA 2969 CCTAGG CCAG 3587 3609CCCTGGTCAACCTCAAC 1201 GCCTAGGTTGAGGTTG 2970 CTAGGC ACCA 3588 3610CCTGGTCAACCTCAACC 1202 GGCCTAGGTTGAGGTT 2971 TAGGCC GACC 3597 3619CCTCAACCTAGGCCTCC 1203 TAAATAGGAGGCCTAG 2972 TATTTA GTTG 3603 3625CCTAGGCCTCCTATTTA 1204 CTAGAATAAATAGGA 2973 TTCTAG GGCCT 3609 3631CCTCCTATTTATTCTAGC 1205 AGGTGGCTAGAATAA 2974 CACCT ATAGG 3612 3634CCTATTTATTTAGCCA 1206 TAGAGGTGGCTAGAAT 2975 CCTCTA AAAT 3626 3648CCACCTCTAGCCTAGCC 1207 GTAAACGGCTAGGCTA 2976 GTTTAC GAGG 3629 3651CCTCTAGCCTAGCCGTT 1208 TGAGTAAACGGCTAGG 2977 TACTCA CTAG 3636 3658CCTAGCCGTTTACTCAA 1209 AGAGGATTGAGTAAA 2978 TCCTCT CGGCT 3641 3663CCGTTTACTCAATCCTC 1210 TGATCAGAGGATTGAG 2979 TGATCA TAAA 3654 3676CCTCTGATCAGGGTGAG 1211 TTGATGCTCACCCTGA 2980 CATCAA TCAG 3689 3711CCCTGATCGGCGCCTG 1212 TGCTCGCAGTGCGCCG 2981 CGAGCA ATCA 3690 3712CCTGATCGGCACTGC 1213 CTGCTCGCAGTGCGCC 2982 GAGCAG GATC 3716 3738CCCAAACAATCTCATAT 1214 GACTTCATATGAGATT 2983 GAAGTC GTTT 3717 3739CCAAACAATCTCATATG 1215 TGACTTCATATGAGAT 2984 AAGTCA TGTT 3740 3762CCCTAGCCATCATTCTA 1216 TGATAGTAGAATGATG 2985 CTATCA GCTA 3741 3763CCTAGCCATCATTCTAC 1217 TTGATAGTAGAATGAT 2986 TATCAA GGCT 3746 3768CCATCATTCTACTATCA 1218 TAATGTTGATAGTAGA 2987 ACATTA ATGA 3782 3804CCTTTAACCTCTCCACC 1219 GATAAGGGTGGAGAG 2988 CTTATC GTTAA 3789 3811CCTCTCCACCCTTATCA 1220 GTGTTGTGATAAGGGT 2989 CAACAC GGAG 3794 3816CCACCCTTATCACAACA 1221 TTCTTGTGTTGTGATA 2990 CAAGAA AGGG 3797 3819CCCTTATCACAACACAA 1222 GTGTTCTTGTGTTGTG 2991 GAACAC ATAA 3798 3820CCTTATCACAACACAAG 1223 GGTGTTCTTGTGTTGT 2992 AACACC GATA 3819 3841CCTCTGATTACTCCTGC 1224 ATGATGGCAGGAGTA 2993 CATCAT ATCAG 3831 3853CCTGCCATCATGACCCT 1225 TGGCCAAGGGTCATGA 2994 TGGCCA GGC 3835 3857CCATCATGACCCTTGGC 1226 ATTATGGCCAAGGGTC 2995 CATAAT ATGA 3844 3866CCCTTGGCCATAATATG 1227 ATAAATCATATTATGG 2996 ATTTAT CCAA 3845 3867CCTTGGCCATAATATGA 1228 GATAAATCATATTATG 2997 TTTATC GCCA 3851 3873CCATAATATGATTTATC 1229 TGTGGAGATAAATCAT 2998 TCCACA ATTA 3869 3891CCACACTAGCAGAGACC 1230 TCGGTTGGTCTCTGCT 2999 AACCGA AGTG 3884 3906CCAACCGAACCCCCTTC 1231 AAGGTCGAAGGGGGT 3000 GACCTT TCGGT 3888 3910CCGAACCCCCTTCGACC 1232 CGGCAAGGTCGAAGG 3001 TTGCCG GGGTT 3893 3915CCCCCTTCGACCTTGCC 1233 CCCTTCGGCAAGGTCG 3002 GAAGGG AAGG 3894 3916CCCCTTCGACCTTGCCG 1234 CCCCTTCGGCAAGGTC 3003 AAGGGG GAAG 3895 3917CCCTTCGACCTTGCCGA 1235 TCCCCTTCGGCAAGGT 3004 AGGGGA CGAA 3896 3918CCTTCGACCTTGCCGAA 1236 CTCCCCTTCGGCAAGG 3005 GGGGAG TCGA 3903 3925CCTTGCCGAAGGGGAGT 1237 GTTCGGACTCCCCTTC 3006 CCGAAC GGCA 3908 3930CCGAAGGGGAGTCCGA 1238 GACTAGTTCGGACTCC 3007 ACTAGTC CCTT 3920 3942CCGAACTAGTCTCAGGC 1239 GTTGAAGCCTGAGACT 3008 TTCAAC AGTT 3953 3975CCGCAGGCCCCTTCGCC 1240 GAATAGGGCGAAGGG 3009 CTATTC GCCTG 3960 3982CCCCTTCGCCCTATTCTT 1241 CTATGAAGAATAGGGC 3010 CATAG GAAG 3961 3983CCCTTCGCCCTATTCTTC 1242 GCTATGAAGAATAGG 3011 ATAGC GCGAA 3962 3984CCTTCGCCCTATTCTTCA 1243 GGCTATGAAGAATAG 3012 TAGCC GGCGA 3968 3990CCCTATTCTTCATAGCCG 1244 GTATTCGGCTATGAAG 3013 GAATAC AATA 3969 3991CCTATTCTTCATAGCCG 1245 TGTATTCGGCTATGAA 3014 AATACA GAAT 3983 4005CCGAATACACAAACATT 1246 TATAATAATGTTTGTG 3015 ATTATA TATT 4013 4035CCCTCACCACTACAATC 1247 TAGGAAGATTGTAGTG 3016 TTCCTA GTGA 4014 4036CCTCACCACTACAATCT 1248 CTAGGAAGATTGTAGT 3017 TCCTAG GGTG 4019 4041CCACTACAATCTTCCTA 1249 TGTTCCTAGGAAGATT 3018 GGAACA CTTAG 4032 4054CCTAGGAACAACATATG 1250 GTGCGTCATATGTTGT 3019 ACGCAC TCCT 4058 4080CCCCTGAACTCTACACA 1251 ATATGTTGTGTAGAGT 3020 ACATAT TCAG 4059 4081CCCTGAACTCTACACAA 1252 AATATGTTGTGTAGAG 3021 CATATT TTCA 4060 4082CCTGAACTCTACCAAC 1253 AAATATGTTGTGTAGA 3022 ATATTT GTTC 4088 4110CCAAGACCCTACTTCTA 1254 GGAGGTTAGAAGTAG 3023 ACCTCC GGTCT 4094 4116CCCTACTTCTAACCTCC 1255 GAACAGGGAGGTTAG 3024 CTGTTC AAGTA 4095 4117CCTACTTCTAACTCCC 1256 AGAACAGGGAGGTTA 3025 TGTTCT GAAGT 4106 4128CCTCCCTGTTCTTATGA 1257 TCGAATTCATAAGAAC 3026 ATTCGA AGGG 4109 4131CCCTGTTCTTATGAATT 1258 TGTTCGAATTCATAAG 3027 CGAACA AACA 4110 4132CCTGTTCTTATGAATTC 1259 CTGTTCGAATTCATAA 3028 GAACAG GAAC 4137 4159CCCCCGATTCCGCTACG 1260 GTTGGTCGTAGCGGAA 3029 ACCAAC TCGG 4138 4160CCCCGATTCCGCTACGA 1261 AGTTGGTCGTAGCGGA 3030 CCAACT ATCG 4139 4161CCCGATTCCGCTACGAC 1262 GAGTTGGTCGTAGCGG 3031 CAACTC AATC 4140 4162CCGATTCCGCTACGACC 1263 TGAGTTGGTCGTAGCG 3032 AACTCA GAAT 4146 4168CCGCTACGACCAACTCA 1264 GGTGTATGAGTTGGTC 3033 TACACC GTAG 4155 4177CCAACTCATACACCTCC 1265 TTCATAGGAGGTGTAT 3034 TATGAA GAGT 4167 4189CCTCCTATGAAAAAACT 1266 GTAGGAAGTTTTTTCA 3035 TCCTAC TAGG 4170 4192CCTATGAAAAAACTTCC 1267 GTGGTAGGAAGTTTTT 3036 TACCAC TCAT 4185 4207CCTACCACTCACCCTAG 1268 GTAATGCTAGGGTGAG 3037 CATTAC TGGT 4189 4211CCACTCACCCTAGCATT 1269 ATAAGTAATGCTAGGG 3038 ACTTAT TGAG 4196 4218CCCTAGCATTACTTATA 1270 ATATCATATAAGTAAT 3039 TGATAT GCTA 4197 4219CCTAGCATTACTTATAT 1271 CATATCATATAAGTAA 3040 GATATG TGCT 4223 4245CCATACCCATTACAATC 1272 GCTGGAGATTGTAATG 3041 TCCAGC GGTA 4228 4250CCCATTACAATCTCCAG 1273 GGAATGCTGGAGATTG 3042 CATTCC TAAT 4229 4251CCATTACAATCTCCAGC 1274 GGGAATGCTGGAGATT 3043 ATTCCC GTAA 4241 4263CCAGCATTCCCCCTCAA 1275 TTAGGTTTGAGGGGGA 3044 ACCTAA ATGC 4249 4271CCCCCTCAAACCTAAGA 1276 CATATTTCTTAGGTTT 3045 AATATG GAGG 4250 4272CCCCTCAAACCTAAGAA 1277 ACATATTTCTTAGGTT 3046 ATATGT TGAG 4251 4273CCCTCAAACCTAAGAAA 1278 GACATATTTCTTAGGT 3047 TATGTC TTGA 4252 4274CCTCAAACCTAAGAAAT 1279 AGACATATTTCTTAGG 3048 ATGTCT TTTG 4259 4281CCTAAGAAATATGTCTG 1280 TTTTATCAGACATATT 3049 ATAAAA TCTT 4318 4340CCCCCTTATTTCTAGGA 1281 TCATAGTCCTAGAAAT 3050 CTATGA AAGG 4319 4341CCCCTTATTTCTAGGAC 1282 CTCATAGTCCTAGAAA 3051 TATGAG TAAG 4320 4342CCCTTATTTCTAGGACT 1283 TCTCATAGTCCTAGAA 3052 ATGAGA ATAA 4321 4343CCTTATTTCTAGGACTA 1284 TTCTCATAGTCCTAGA 3053 TGAGAA AATA 4349 4371CCATCCCTGAGAATCC 1285 AATTTTGGATTCTCAG 3054 AAAATT GGAT 4350 4372CCATCCCTGAGAATCCA 1286 GAATTTTGGATTCTCA 3055 AAATTC GGGA 4354 4376CCCTGAGAATCCAAAAT 1287 CGGAGAATTTTGGATT 3056 TCTCCG CTCA 4355 4377CCTGAGAATCCAAAATT 1288 ACGGAGAATTTTGGAT 3057 CTCCGT TCTC 4364 4386CCAAAATTCTCCGTGCC 1289 ATAGGTGGCACGGAG 3058 ACCTAT AATTT 4374 4396CCGTGCCACCTATCACA 1290 ATGGGGTGTGATAGGT 3059 CCCCAT GGCA 4379 4401CCACCTATCACACCCCA 1291 TTAGGATGGGGTGTGA 3060 TCCTAA TAGG 4382 4404CCTATCACACCCCATCC 1292 ACTTTAGGATGGGGTG 3061 TAAAGT TGAT 4391 4413CCCCATCCTAAAGTAAG 1293 GCTGACCTTACTTTAG 3062 GTCAGC GATG 4392 4414CCCATCCTAAAGTAAGG 1294 AGCTGACCTTACTTTA 3063 TCAGCT GGAT 4393 4415CCATCCTAAAGTAAGGT 1295 TAGCTGACCTTACTTT 3064 CAGCTA AGGA 4397 4419CCTAAAGTAAGGTCAGC 1296 TATTTAGCTGACCTTA 3065 TAAATA CTTT 4430 4452CCCATACCCCGAAAATG 1297 AACCAACATTTTCGGG 3066 TTGGTT GTAT 4431 4453CCATACCCCGAAAATGT 1298 AACCAACATTTTCGGG 3067 TGGTTA GGTA 4436 4458CCCCGAAAATGTTGGTT 1299 GGGTATAACCAACATT 3068 ATACCC TTCG 4437 4459CCCGAAAATGTTGGTTA 1300 AGGGTATAACCAACAT 3069 TACCCT TTTC 4438 4460CCGAAAATGTTGGTTAT 1301 AAGGGTATAACCAAC 3070 ACCCTT ATTTT 4456 4478CCCTTCCCGTACTAATT 1302 GGGATTAATTAGTACG 3071 AATCCC GGAA 4457 4479CCTTCCCGTACTAATTA 1303 GGGGATTAATTAGTAC 3072 ATCCCC GGGA 4461 4483CCCGTACTAATTAATCC 1304 GCCAGGGGATTAATTA 3073 CCTGGC GTAC 4462 4484CCGTACTAATTAATCCC 1305 GGCCAGGGGATTAATT 3074 CTGGCC AGTA 4476 4498CCCCTGGCCCAACCCGT 1306 TAGATGACGGGTTGGG 3075 CATCTA CCAG 4477 4499CCCTGGCCCAACCCGTC 1307 GTAGATGACGGGTTGG 3076 ATCTAC GCCA 4478 4500CCTGGCCCAACCCGTCA 1308 AGTAGATGACGGGTTG 3077 TCTACT GGCC 4483 4505CCCAACCCGTCATCTAC 1309 GGTAGAGTAGATGAC 3078 TCTACC GGGTT 4484 4506CCAACCCGTCATCTACT 1310 TGGTAGAGTAGATGAC 3079 CTACCA GGGT 4488 4510CCCGTCATCTACTCTAC 1311 AAGATGGTAGAGTAG 3080 CATCTT ATGAC 4489 4511CCGTCATCTACTCTACC 1312 AAAGATGGTAGAGTA 3081 ATCTTT GATGA 4504 4526CCATCTTTGCAGGCACA 1313 GATGAGTGTGCCTGCA 3082 CTCATC AAGA 4555 4577CCTGAGTAGGCCTA 1314 GTTTATTTCTAGGCCT 3083 ATAAAC ACTC 4565 4587CCTAGAAATAAACATGC 1315 AAGCTAGCATGTTTAT 3084 TAGCTT TTCT 4593 4615CCAGTTCTAACCAAAAA 1316 TTTATTTTTTTGGTTAG 3085 AATAAA AAC 4603 4625CCAAAAAAATAAACCCT 1317 GGAACGAGGGTTTATT 3086 CGTTCC TTTT 4616 4638CCCTCGTTCCACAGAAG 1318 TGGCAGCTTCTGTGGA 3087 CTGCCA ACGA 4617 4639CCTCGTTCCACAGAAGC 1319 ATGGCAGCTTCTGTGG 3088 TGCCAT AACG 4624 4646CCACAGAAGCTGCCATC 1320 ATACTTGATGGCAGCT 3089 AAGTAT TCTG 4636 4658CCATCAAGTATTTCCTC 1321 TTGCGTGAGGAAATAC 3090 ACGCAA TTGA 4649 4671CCTCACGCAAGCAACCG 1322 TGGATGCGGTTGCTTG 3091 CATCCA CGTG 4663 4685CCGCATCCATAATCCTT 1323 TATTAGAAGGATTATG 3097 CTAATA GATG 4669 4691CCATAATCCTTCTAATA 1324 GATAGCTATTAGAAGG 3093 GCTATC ATTA 4676 4698CCTTCTAATAGCTATCC 1325 TGAAGAGGATAGCTAT 3094 TCTTCA TAGA 4691 4713CCTCTTCAACAATATAC 1326 CGGAGAGTATATTGTT 3095 TCTCCG GAAG 4711 4733CCGGACAATGAACCATA 1327 ATTGGTTATGGTTCAT 3096 ACCAAT TGTC 4723 4745CCATAACCAATACTACC 1328 TTGATTGGTAGTATTG 3097 AATCAA GTTA 4729 4751CCAATACTACCAATCAA 1329 TGAGTATTGATTGGTA 3098 TACTCA GTAT 4738 4760CCAATCAATACTCATCA 1330 TATTAATGATGAGTAT 3099 TTAATA TGAT 4795 4817CCCCCTTTCACTTCTGA 1331 TGGGACTCAGAAGTGA 3100 GTCCCA AAGG 4796 4818CCCCTTTCACTTCTGAG 1332 CTGGGACTCAGAAGTG 3101 TCCCAG AAAG 4797 4819CCCTTTCACTTCTGAGT 1333 TCTGGGACTCAGAAGT 3102 CCCAGA GAAA 4798 4820CCTTTCACTTCTGAGTC 1334 CTCTGGGACTCAGAAG 3103 CCAGAG TGAA 4814 4836CCCAGAGGTTACCCAAG 1335 GGGTGCCTTGGGTAAC 3104 GCACCC CTCT 4815 4837CCAGAGGTTACCCAAGG 1336 GGGGTGCCTTGGGTAA 3105 CACCCC CCTC 4825 4847CCCAAGGCACCCCTCTG 1337 GGATGTCAGAGGGGT 3106 ACATCC GCCTT 4826 4848CCAAGGCACCCCTCTGA 1338 CGGATGTCAGAGGGGT 3107 CATCCG GCCT 4834 4856CCCCTCTGACATCCGGC 1339 AAGCAGGCCGGATGTC 3108 CTGCTT AGAG 4835 4857CCCTCTGACATCCGGCC 1340 GAAGCAGGCCGGATG 3109 TGCTTC TCAGA 4836 4858CCTCTGACATCCGGCCT 1341 AGAAGCAGGCCGGAT 3110 GCTTCT GTCAG 4846 4868CCGGCCTGCTTCTTCTC 1342 TCATGTGAGAAGAAGC 3111 ACATGA AGGC 4850 4872CCTGCTTCTTCTCACAT 1343 TTTGTCATGTGAGAAG 3112 GACAAA AAGC 4879 4901CCCCCATCTCAATCATA 1344 TTGGTATATGATTGAG 3113 TACCAA ATGG 4880 4902CCCCATCTCAATCATAT 1345 TTTGGTATATGATTGA 3114 ACCAAA GATG 4881 4903CCCATCTCAATCATATA 1346 ATTTGGTATATGATTG 3115 CCAAAT AGAT 4882 4904CCATCTCAATCATATAC 1347 GATTTGGTATATGATT 3116 CAAATC GAGA 4898 4920CCAAATCTCTCCCTCAC 1348 CGTTTAGTGAGGGAGA 3117 TAAACG GATT 4908 4930CCCTCACTAAACGTAAG 1349 AGAAGGCTTACGTTTA 3118 CCTTCT GTGA 4909 4931CCTCACTAAACGTAAGC 1350 GAGAAGGCTTACGTTT 3119 CTTCTC AGTG 4925 4947CCTTCTCCTCACTCTCTC 1351 AGATTGAGAGAGTGA 3120 AATCT GGAGA 4931 4953CCTCACTCTCTCAATCTT 1352 TGGATAAGATTGAGAG 3121 ATCCA AGTG 4951 4973CCATCATAGCAGGCAGT 1353 ACCTCAACTGCCTGCT 3122 TGAGGT ATGA 4982 5004CCAAACCCAGCTACGCA 1354 AGATTTTGCGTAGCTG 3123 AAATCT GGTT 4987 5009CCCAGCTACGCAAAATC 1355 TGCTAAGATTTTGCGT 3124 TTAGCA AGCT 4988 5010CCAGCTACGCAAAATCT 1356 ATGCTAAGATTTTGCG 3125 TAGCAT TAGC 5014 5036CCTCAATTACCCACATA 1357 TCATCCTATGTGGGTA 3126 GGATGA ATTG 5023 5045CCCACATAGGATGAATA 1358 TGCTATTATTCATCCT 3127 ATAGCA ATGT 5024 5046CCACATAGGATGAATAA 1359 CTGCTATTATTCATCCT 3128 TAGCAG ATG 5052 5074CCGTACAACCCTAACAT 1360 ATGGTTATGTTAGGGT 3129 AACCAT TGTA 5060 5082CCCTAACATAACCATTC 1361 AATTAAGAATGGTTAT 3130 TTAATT GTTA 5061 5083CCTAACATAACCATTCT 1362 AAATTAAGAATGGTTA 3131 TAATTT TGTT 5071 5093CCATTCTTAATTTAACT 1363 ATAAATAGTTAAATTA 3132 ATTTAT AGAA 5099 5121CCTAACTACTACCGCAT 1364 GTAGGAATGCGGTAGT 3133 TCCTAC AGTT 5110 5132CCGCATTCCTACTACTC 1365 TAAGTTGAGTAGTAGG 3134 AACTTA AATG 5117 5139CCTACTACTCAACTTAA 1366 TGGAGTTTAAGTTGAG 3135 ACTCCA TAGT 5137 5159CCAGCACCACGACCCTA 1367 TAGTAGTAGGGTCGTG 3136 CTACTA GTGC 5143 5165CCACGACCCTACTACTA 1368 GCGAGATAGTAGTAG 3137 TCTCGC GGTCG 5149 5171CCCTACTACTATCTCGC 1369 TCAGGTGCGAGATAGT 3138 ACCTGA AGTA 5150 5172CCTACTACTATCTCGCA 1370 TTCAGGTGCGAGATAG 3139 CCTGAA TAGT 5167 5189CCTGAAACAAGCTAACA 1371 TAGTCATGTTAGCTTG 3140 TGACTA TTTC 5193 5215CCCTTAATTCCATCCAC 1372 AGGAGGGTGGATGGA 3141 CCTCCT ATTAA 5194 5216CCTTAATTCCATCCACC 1373 GAGGAGGGTGGATGG 3142 CTCCTC AATTA 5202 5224CCATCCACCCTCCTC 1374 CCTAGGGAGAGGAGG 3143 CCTAGG GTGGA 5206 5228CCACCCTCCTCTCCCTA 1375 GCCTCCTAGGGAGAGG 3144 GGAGGC AGGG 5209 5231CCCTCCTCTCCCTAGGA 1376 CAGGCCTCCTAGGGAG 3145 GGCCTG AGGA 5210 5232GCTCCTCTCCCTAGGAG 1377 CCAGGCCTCCTAGGGA 3146 GCCTGC GAGG 5213 5235CCTCTCCCTAGGAGGCC 1378 GGGGCAGGCCTCCTAG 3147 TGCCCC GGAG 5218 5240CCCTAGGAGGCCTGCCC 1379 TAGCGGGGGCAGGCCT 3148 CCGCTA CCTA 5219 5241CCTAGGAGGCCTGCCCC 1380 TTAGCGGGGGCAGGCC 3149 CGCTAA TCCT 5228 5250CCTGCCCCCGCTAACCG 1381 AAAAGCCGGTTAGCG 3150 GCTTTT GGGGC 5232 5254CCCCCGCTAACCGGCTT 1382 GGCAAAAAGCCGGTT 3151 TTTGCC AGCGG 5213 5255CCCCGCTAACCGGCTTT 1383 GGGCAAAAAGCCGGT 3152 TTGCCC TAGCG 5234 5256CCCGCTAACCGGCTTTT 1384 TGGGCAAAAAGCCGG 3153 TGCCCA TTAGC 5235 5257CCGCTAACCGGCTTTTT 1385 TTGGGCAAAAAGCCG 3154 GCCCAA GTTAG 5242 5264CCGGCTTTTTGCCCAAA 1386 GGCCCATTTGGGCAAA 3155 TGGGCC AAGC 5253 5275CCCAAATGGGCCATTAT 1387 TCTTCGATAATGGCCC 3156 CGAAGA ATTT 5254 5276CCAAATGGGCCATTATC 1388 TTCTTCGATAATGGCC 3157 GAAGAA CATT 5263 5285CCATTATCGAAGAATTC 1389 TTTTGTGAATTCTTCG 3158 ACAAAA ATAA 5294 5316CCTCATCATCCCCACCA 1390 CTATGATGGTGGGGAT 3159 TCATAG GATG 5303 5325CCCCACCATCATAGCCA 1391 TGATGGTGGCTATGAT 3160 CCATCA GGTG 5304 5326CCCACCATCATAGCCAC 1392 GTGATGGTGGCTATGA 3161 CATCAC TGGT 5305 5327CCACCATCATAGCCACC 1393 GGTGATGGTGGCTATG 3162 ATCACC ATGG 5308 5330CCATCATAGCCACCATC 1394 GAGGGTGATGGTGGCT 3163 ACCCTC ATGA 5317 5339CCACCATCACCCTCCTT 1395 GAGGTTAAGGAGGGT 3164 AACCTC GATGG 5320 5342CCATCACCCTCCTTAAC 1396 GTAGAGGTTAAGGAG 3165 CTCTAC GGTGA 5326 5348CCCTCCTTAACCTCTAC 1397 GTAGAAGTAGAGGTTA 3166 TTCTAC AGGA 5327 5349CCTCCTTAACCTCTACTT 1398 GGTAGAAGTAGAGGTT 3167 CTACC AAGG 5330 5352CCTTAACCTCTACTTCT 1399 GTAGGTAGAAGTAGA 3168 ACCTAC GGTTA 5336 5358CCTCTACTTCTACCTAC 1400 TTAGGCGTAGGTAGAA 3169 GCCTAA GTAG 5348 5370CCTACGCCTAATCTACT 1401 AGGTGGAGTAGATTAG 3170 CCACCT GCGT 5354 5376CCTAATCTACTCCACCT 1402 TGATTGAGGTGGAGTA 3171 CAATCA GATT 5365 5387CCACCTCAATCACACTA 1403 GGGGAGTAGTGTGATT 3172 CTCCCC GAGG 5368 5390CCTCAATCACACTACTC 1404 TATGGGGAGTAGTGTG 3173 CCCATA ATTG 5384 5406CCCCATATCTAACAACG 1405 TTTTTACGTTGTTAGAT 3174 TAAAAA ATG 5385 5407CCCATATCTAACAACGT 1406 ATTTTTACGTTGTTAG 3175 AAAAAT ATAT 5386 5408CCATATCTAACAACGT 1407 TATTTTTACGTTGTTAG 3176 AAAATA ATA 5433 5455CCCACCCCATTCCTCCC 1408 AGTGTGGGGAGGAAT 3177 CACACT GGGGT 5434 5456CCACCCCATTCCTCCCC 1409 GAGTGTGGGGAGGAA 3178 ACACTC TGGGG 5417 5459CCCCATTCCTCCCCACA 1410 GATGAGTGTGGGGAG 3179 CTCATC GAATG 5438 5460CCCATTCCTCCCCACAC 1411 CGATGAGTGTGGGGA 3180 TCATCG GGAAT 5439 5461CCATTCCTCCCCACACT 1412 GCGATGAGTGTGGGG 3181 CATCGC AGGAA 5444 5466CCTCCCCACACTCATCG 1413 TAAGGGCGATGAGTGT 3182 CCCTTA GGGG 5447 5469CCCCACACTCATCGCCC 1414 TGGTAAGGGCGATGA 3183 TTACCA GTGTG 5448 5470CCCACACTCATCGCCCT 1415 GTGGTAAGGGCGATG 3184 TACCAC AGTGT 5449 5471CCACACTCATCGCCCTT 1416 CGTGGTAAGGGCGATG 3185 ACCACG AGTG 5461 5483CCCTTACCACGCTACTC 1417 AGGTAGGAGTAGCGT 3186 CTACCT GGTAA 5462 5484CCTTACCACGCTACTCC 1418 TAGGTAGGAGTAGCGT 3187 TACCTA GGTA 5467 5489CCACGCTACTCCTACCT 1419 GGAGATAGGTAGGAG 3188 ATCTCC TAGCG 5477 5499CCTACCTATCTCCCCTTT 1420 GTATAAAAGGGGAGA 3189 TATAC TAGGT 5481 5503CCTATCTCCCCTTTATA 1421 ATTAGTATAAAAGGGG 3190 CTAAT AGAT 5488 5510CCCCTTTTATACTAATA 1422 TAAGATTATTAGTATA 3191 ATCTTA AAAG 5489 5511CCCTTTTATACTAATAA 1423 ATAAGATTATTAGTAT 3192 TCTTAT AAAA 5490 5512CCTTTTATACTAATAAT 1424 TATAAGATTATTAGTA 3193 CTTATA TAAA 5534 5556CCAAGAGCCTTCAAAGC 1425 CTGAGGGCTTTGAAGG 3194 CCTCAG CTCT 5541 5563CCTTCAAAGCCCTCAGT 1426 CAACTTACTGAGGGCT 3195 AAGTTG TTGA 5550 5572CCCTCAGTAAGTTGCAA 1427 TAAGTATTGCAACTTA 3196 TACTTA CTGA 5551 5573CCTCAGTAAGTTGCAAT 1428 TTAAGTATTGCAACTT 3197 ACTTAA ACTG 5601 5623CCCCACTCTGCATCAAC 1429 CGTTCAGTTGATGCAG 3198 TGAACG AGTG 5602 5624CCCACTCTGCATCAACT 1430 GCGTTCAGTTGATGCA 3199 GAACGC GAGT 5603 5625CCACTCTGCATCAACTG 1431 TGCGTTCAGTTGATGC 3200 AACGCA AGAG 5632 5654CCACTTTAATTAAGCTA 1432 AGGGCTTAGCTTAATT 3201 AGCCCT AAAG 5651 5673CCCTTACTAGACCAATG 1433 AAGTCCCATTGGTCTA 3202 GGACTT GTAA 5652 5674CCTTACTAGACCAATGG 1434 TAAGTCCCATTGGTCT 3203 GACTTA AGTA 5667 5684CCAATGGGACTTAAACC 1435 TTTGTGGGTTTAAGTC 3204 CACAAA CCAT 5677 5699CCCACAAACACTTAGTT 1436 GCTGTTAACTAAGTGT 3205 AACAGC TTGT 5678 5700CCACAAACACTTAGTTA 1437 AGCTGTTAACTAAGTG 3206 ACAGCT TTTG 5706 5728CCCTAATCAACTGGCTT 1438 AGATTGAAGCCAGTTG 3207 CAATCT ATTA 5707 5729CCTAATCAACTGGCTTC 1439 TAGATTGAAGCCAGTT 3208 AATCTA GATT 5735 5757CCCGCCGCCGGGAAAA 1440 CCGCCTTTTTTCCCGG 3209 AAGGCGG CGGC 5736 5758CCGCCGCCGGGAAAAA 1441 CCCGCCTTTTTTCCCG 3210 AGGCGGG GCGG 5739 5761CCGCCGGGAAAAAAGG 1442 TCTCCCGCCTTTTTTCC 3211 CGGGAGA CGG 5742 5764CCGGGAAAAAAGGCGG 1443 GCTTCTCCCGCCTTTTT 3212 GAGAAGC TCC 5764 5786CCCCGGCAGGTTTGAAG 1444 AACCAGCTTCAAACCT 3213 CTGCTT GCCG 5765 5787CCCGGCAGGTTTGAAGC 1445 GAAGCAGCTTCAAACC 3214 TGCTTC TGCC 5766 5788CCGGCAGGTTTGAAGCT 1446 AGAAGCAGCTTCAAAC 3215 GCTTCT CTGC 5817 5839CCTCGGAGCTGGTAAAA 1447 GCCTCTTTTTACCAGC 3216 AGAGGC TCCG 5839 5861CCTAACCCCTGTCTTTA 1448 TAAATCTAAAGACAGG 3217 GATTTA GGTT 5844 5866CCCCTGTCTTTAGATTT 1449 GACTGTAAATCTAAAG 3218 ACAGTC ACAG 5845 5867CCCTGTCTTTAGATTTA 1450 GGACTGTAAATCTAAA 3219 CAGTCC GACA 5846 5868CCTGTCTTTAGATTTAC 1451 TGGACTGTAAATCTAA 3220 AGTCCA AGAC 5866 5888CCAATGCTTCACTCAGC 1452 AAAATGGCTGAGTGA 3221 CATTTT AGCAT 5882 5904CCATTTTACCTCACCCC 1453 TCAGTGGGGGTGAGGT 3222 CACTGA AAAA 5890 5912CCTCACCCCCACTGATG 1454 GGCGAACATCAGTGG 3223 TTCGCC GGGTG 5895 5917CCCCCACTGATGTTCGC 1455 CGGTCGGCGAACATCA 3224 CGACCG GTGG 5896 5918CCCCACTGATGTTCGCC 1456 ACGGTCGGCGAACATC 3225 GACCGT AGTG 5897 5919CCCACTGATGTTCGCCG 1457 AACGGTCGGCGAACAT 3226 ACCGTT CAGT 5898 5920CCACTGATGTTCGCCGA 1458 CAACGGTCGGCGAAC 3227 CCGTTG ATCAG 5911 5933CCGACCGTTGACTATTC 1459 TGTAGAGAATAGTCAA 3228 TCTACA CGGT 5915 5937CCGTTGACTATTCTCTA 1460 GGTTTGTAGAGAATAG 3229 CAAACC TCAA 5936 5958CCACAAAGACATTGGA 1461 ATAGTGTTCCAATGTC 3230 ACACTAT TTTG 5960 5982CCTATTATTCGGCGCAT 1462 CAGCTCATGCGCCGAA 3231 GAGCTG TAAT 5987 6009CCTAGGCACAGCTCTAA 1463 GGAGGCTTAGAGCTGT 3232 GCCTCC GCCT 6005 6027CCTCCTTATTCGAGCCG 1464 CCAGCTCGGCTCGAAT 3233 AGCTGG AAGG 6008 6030CCTTATTCGAGCCGAGC 1465 GGCCCAGCTCGGCTCG 3234 TGGGCC AATA 6019 6041CCGAGCTGGGCCAGCCA 1466 GTTGCCTGGCTGGCCC 3235 GGCAAC AGCT 6029 6051CCAGCCAGGCAACCTTC 1467 TACCTAGAAGGTTGCC 3236 TAGGTA TGGC 6033 6055CCAGGCAACCTTCTAGG 1468 TCGTTACCTAGAAGGT 3237 TAACGA TGCC 6041 6063CCTTCTAGGTAACGACC 1469 AGATGTGGTCGTTACC 3238 ACATCT TAGA 6056 6078CCACATCTACAACGTTA 1470 TGACGATAACGTTGTA 3239 TCGTCA GATG 6082 6104CCCATGCATTTGTAATA 1471 GAAGATTATTACAAAT 3240 ATCTTC GCAT 6083 6105CCATGCATTTGTAATAA 1472 AGAAGATTATTACAAA 3241 TCTTCT TGCA 6117 6139CCCATCATAATCGGAGG 1473 CCAAAGCCTCCGATTA 3242 CTTTGG TGAT 6118 6140CCATCATAATCGGAGGC 1474 GCCAAAGCCTCCGATT 3243 TTTGGC ATGA 6153 6175CCCCTAATAATCGGTGC 1475 TCGGGGGCACCGATTA 3244 CCCCGA TTAG 6154 6176CCCTAATAATCGGTGCC 1476 ATCGGGGGCACCGATT 3245 CCCGAT ATTA 6155 6177CCTAATAATCGGTGCCC 1477 TATCGGGGGCACCGAT 3246 CCGATA TATT 6169 6191CCCCCGATATGGCGTTT 1478 GCGGGGAAACGCCAT 3247 CCCCGC ATCGG 6170 6192CCCCGATATGGCGTTTC 1479 TGCGGGGAAACGCCAT 3248 CCCGCA ATCG 6171 6193CCCGATATGGCGTTTCC 1480 ATGCGGGGAAACGCC 3249 CCGCAT ATATC 6172 6194CCGATATGGCGTTTCCC 1481 TATGCGGGGAAACGCC 3250 CGCATA ATAT 6186 6208CCCCGCATAAACAACAT 1482 AAGCTTATGTTGTTTA 3251 AAGCTT TGCG 6187 6209CCCGCATAAACAACATA 1483 GAAGCTTATGTTGTTT 3252 AGCTTC ATGC 6188 6210CCGCATAAACAACATAA 1484 AGAAGCTTATGTTGTT 3253 GCTTCT TATG 6219 6241CCTCCCTCTCTCCTACTC 1485 AGCAGGAGTAGGAGA 3254 CTGCT GAGGG 6222 6244CCCTCTCTCCTACTCCTG 1486 GCGAGCAGGAGTAGG 3255 CTCGC AGAGA 6223 6245CCTCTCTCCTACTCCTGC 1487 TGCGAGCAGGAGTAG 3256 TCGCA GAGAG 6210 6252CCTACTCCTGCTCGCAT 1488 TAGCAGATGCGAGCA 3257 CTGCTA GGAGT 6236 6258CCTGCTCGCATCTGCTA 1489 CCACTATAGCAGATGC 3258 TAGTGG GAGC 6262 6284CCGGAGCAGGAACAGG 1490 TGTTCAACCTGTTCCT 3259 TTGAACA GCTC 6290 6312CCCTCCCTTAGCAGGGA 1491 AGTAGTTCCCTGCTAA 3260 ACTACT GGGA 6291 6313CCTCCCTTAGCAGGGAA 1492 GAGTAGTTCCCTGCTA 3261 CTACTC AGGG 6294 6316CCCTTAGCAGGGAACTAC 1493 TGGGAGTAGTTCCCTG 3262 CTCCCA CTAA 6295 6317CCTTAGCAGGGAACTAC 1494 GTGGGAGTAGTTCCCT 3263 TCCCAC GCTA 6313 6335CCCACCCTGGAGCCTCC 1495 GTCTACGGTGGCTCCA 3264 GTAGAC GGGT 6314 6336CCACCCTGGAGCCTCCG 1496 GGTCTACGGAGGCTCC 3265 TAGACC AGGG 6317 6339CCCTGGAGCCTCCGTAG 1497 TTAGGTCTACGGAGGC 3266 ACCTAA TCCA 6318 6340CCTGGAGCCTCCGTAGA 1498 GTTAGGTCTACGGAGG 3267 CCTAAC CTCC 6325 6347CCTCCGTAGACCTAACC 1499 GAAGATGGTTAGGTCT 3268 ATCTTC ACGG 6328 6350CCGTAGACCTAACCATC 1500 GGAGAAGATGGTTAG 3269 TTCTCC GTCTA 6335 6357CCTAACCATCTTCTCCTT 1501 GGTGTAAGGAGAAGA 3270 ACACC TGGTT 6340 6362CCATCTTCTCCTTACAC 1502 TGCTAGGTGTAAGGAG 3271 CTAGCA AAGA 6349 6371CTTACACCTAGCAGGT 1503 GGAGACACCTGCTAGG 3272 GTCTCC TGTA 6356 6378CCTAGCAGGTGTCTCCT 1504 AGATAGAGGAGACAC 3273 CTATCT CTGCT 6370 6392CCTCTATCTTAGGGGCC 1505 ATTGATGGCCCCTAAG 3274 ATCAAT ATAG 6385 6407CCATCAATTTCATCACA 1506 AATTGTTGTGATGAAA 3275 ACAATT TTGA 6420 6442CCCCCTGCCATAACCCA 1507 TGGTATTGGGTTATGG 3276 ATACCA CAGG 6421 6443CCCCTGCCATAACCCAA 1508 TTGGTATTGGGTTATGG 3277 TACCAA GCAG 6422 6444CCCTGCCATAACCCAAT 1509 TTTGGTATTGGGTTAT 3278 ACCAAA GGCA 6423 6445CCTGCCATAACCCAATA 1510 GTTTGGTATTGGGTTA 3279 CCAAAC TGGC 6427 6449CCATAACCCAATACCAA 1511 GGGCGTTTGGTATTGG 3280 ACGCCC GTTA 6433 6455CCCAATACCAAACGCCC 1512 GAAGAGGGGCGTTTG 3281 CTCTTC GTATT 6434 6456CCAATACCAAACGCCCC 1513 CGAAGAGGGGCGTTTG 3282 TCTTCG GTAT 6440 6462CCAAACGCCCCTCTTCG 1514 ATCAGACGAAGAGGG 3283 TCTGAT GCGTT 6447 6469CCCCTCTTCGTCTGATC 1515 AGGACGGATCAGACG 3284 CGTCCT AAGAG 6448 6470CCCTCTTCGTCTGATCC 1516 TAGGACGGATCAGAC 3285 GTCCTA GAAGA 6449 6471CCTCTTCGTCTGATCCG 1517 TTAGGACGGATCAGAC 3286 TCCTAA GAAG 6463 6485CCGTCCTAATCACAGCA 1518 TAGGACTGCTGTGATT 3287 GTCCTA AGGA 6467 6489CCTAATCACAGCAGTCC 1519 GAAGTAGGACTGCTGT 3288 TACTTC GATT 6482 6504CCTACTTCTCCTATCTCT 1520 CTGGGAGAGATAGGA 3289 CCCAG GAAGT 6491 6513CCTATCTCTCCCAGTCC 1521 CAGCTAGGACTGGGA 3290 TAGCTG GAGAT 6500 6522CCCAGTCCTAGCTGCTG 1522 TGATGCCAGCAGCTAG 3291 GCATCA GACT 6501 6523CCAGTCCTAGCTGCTGG 1523 GTGATGCCAGCAGCTA 3292 CATCAC GGAC 6506 6528CCTAGCTGCTGGCATCA 1524 GTATAGTGATGCCAGC 3293 CTATAC AGCT 6539 6561CCGCAACCTCAACACCA 1525 AGAAGGTGGTGTTGAG 3294 CCTTCT GTTG 6545 6567CCTCAACACCCTTCT 1526 GGTCGAAGAAGGTGG 3295 TCGACC TGTTG 6553 6575CCACCTTCTTCGACCCC 1527 TCCGGCGGGGTCGAAG 3296 GCCGGA AAGG 6556 6578CCTTCTTCGACCCCGCC 1528 TCCTCCGGCGGGGTCG 3297 GGAGGA AAGA 6566 6588CCCCGCCGGAGGAGGA 1529 TGGGGTCTCCTCCTCC 3298 GACCCCA GGCG 6567 6589CCCGCCGGAGGAGGAG 1530 ATGGGGTCTCCTCCTC 3299 ACCCCAT CGGC 6568 6590CCGCCGGAGGAGGAGA 1531 AATGGGGTCTCCTCCT 3300 CCCCATT CCGG 6571 6593CCGGAGGAGGAGACCC 1532 TAGAATGGGGTCTCCT 3301 CATTCTA CCTC 6584 6606CCCCATTCTATACCAAC 1533 ATAGGTGTTGGTATAG 3302 ACCTAT AATG 6585 6607CCCATTCTATACCAACA 1534 AATAGGTGTTGGTATA 3303 CCTATT GAAT 6586 6608CCATTCTATACCAACAC 1535 GAATAGGTGTTGGTAT 3304 CTATTC AGAA 6596 6618CCAACACCTATTCTGAT 1536 CGAAAAATCAGAATA 3305 TTTTCG GGTGT 6602 6624CCTATTCTGATTTTTCGG 1537 GGTGACCGAAAAATC 3306 TCACC AGAAT 6623 6645CCCTGAAGTTTATATTC 1538 GGATAAGAATATAAA 3307 TTATCC CTTCA 6624 6646CCTGAAGTTTATATTCT 1539 AGGATAAGAATATAA 3308 TATCCT ACTTC 6644 6666CCTACCAGGCTTCGGAA 1540 AGATTATTCCGAAGCC 3309 TAATCT TGGT 6648 6670CCAGGCTTCGGAATAAT 1541 TGGGAGATTATTCCGA 3310 CTCCCA AGCC 6667 6689CCCATATTGTAACTAC 1542 GGAGTAGTAAGTTACA 3311 TACTCC ATAT 6668 6690CCATATTGTAACTTACT 1543 CGGAGTAGTAAGTTAC 3312 ACTCCG AATA 6688 6710CCGGAAAAAAAGAACC 1544 TCCAAATGGTTCTTTTT 3313 ATTTGGA TTC 6702 6724CCATTTGGATACATAGG 1545 ACCATACCTATGTATC 3314 TATGGT CAAA 6749 6771CCTAGGGTTTATCGTGT 1546 GTGCTCACACGATAAA 3315 GAGCAC CCCT 6773 6795CCATATATTTACAGTAG 1547 CTATTCCTACTGTAAA 3316 GAATAG TATA 6820 6842CCTCCGCTACCATAATC 1548 AGCGATGATTATGGTA 3317 ATCGCTGCGG GCGG 6823 6845CCGCTACCATAATCATC 1549 GATAGCGATGATTATG 3318 GCTATC GTAG 6829 6851CCATAATCATCGCTATC 1550 GGTGGGGATAGCGAT 3319 CCCACC GATTA 6845 6867CCCCACCGGCGTCAAAG 1551 TAAATACTTTGACGCC 3320 TATTTA GGTG 6846 6868CCCACCGGCGTCAAAGT 1552 CTAAATACTTTGACGC 3321 ATTTAG CGGT 6847 6869CCACCGGCGTCAAAGTA 1553 GCTAAATACTTTGACG 3322 TTTAGC CCGG 6850 6872CCGGCGTCAAAGTATTT 1554 TCAGCTAAATACTTTG 3323 AGCTGA ACGC 6877 6899CCACACTCCACGGAAGC 1555 CATATTGCTTCCGTGG 3324 AATATG AGTG 6884 6906CCACGGAAGCAATATG 1556 ATCATTTCATATTGCTT 3325 AAATGAT CCG 6925 6947CCCTAGATTCATCTTT 1557 GAAAAGAAAGATGAA 3326 CTTTTC TCCTA 6926 6948CCTAGGATTCATCTTTC 1558 TGAAAAGAAAGATGA 3327 TTTTCA ATCCT 6949 6971CCGTAGGTGGCCTGACT 1559 AATGCCAGTCAGGCCA 3328 GGCATT CCTA 6959 6981CCTGACTGGCATTGTAT 1560 TTGCTAATACAATGCC 3329 TAGCAA AGTC 7027 7049CCCACTTCCACTATGTC 1561 TGATAGGACATAGTGG 3330 CTATCA AAGT 7028 7050CCACTTCCACTATGTCC 1562 TTGATAGGACATAGTG 3331 TATCAA GAAG 7034 7056CCACTATGTCCTATCAA 1563 CTCCTATTGATAGGAC 3332 TAGGAG ATAG 7043 7065CCTATCAATAGGAGCTG 1564 CAAATACAGCTCCTAT 3333 TATTTG TGAT 7066 7088CCATCATAGGAGGCTTC 1565 GTGAATGAAGCCTCCT 3334 ATTCAC ATGA 7095 7117CCCCTATTCTCAGGCTA 1566 AGGGTGTAGCCTGAGA 3335 CACCCT ATAG 7096 7118CCCTATTCTCAGGCTAC 1567 TAGGGTGTAGCCTGAG 3336 ACCCTA AATA 7097 7119CCTATTCFCAGGCTACA 1568 CTAGGGTGTAGCCTGA 3337 CCCTAG GAAT 7114 7136CCCTAGACCAAACCTAC 1569 TTTGGCGTAGGTTTGG 3338 GCCAAA TCTA 7115 7137CCTAGACCAAACCTACG 1570 TTTTGGCGTAGGTTTG 3339 CCAAAA GTCT 7121 7143CCAAACCTACGCCAAAA 1571 AATGGATTTTGGCGTA 3340 TCCATT GGTT 7126 7148CCTACGCCAAAATCCAT 1572  AGTGAAATGGATTTTG 3341 TTCACT GCGT 7132 7154CCAAAATCCATTTCACT 1573 TATGATAGTGAAATGG 3342 ATCATA ATTT 7139 7161CCATTTCACTATCATAT 1574 CGATGAATATGATAGT 3343 TCATCG GAAA 7181 7203CCCACAACACTTTCTCG 1575 ATAGGCCGAGAAAGT 3344 GCCTAT GTTGT 7182 7204CCACAACACTTTCTCGG 1576 GATAGGCCGAGAAAG 3345 CCTATC TGTTG 7199 7221CCTATCCGGAATGCCCC 1577 AACGTCGGGGCATTCC 3346 GACGTT GGAT 7204 7226CCGGAATGCCCCGACGT 1578 CGAGTAACGTCGGGGC 3347 TACTCG ATTC 7212 7234CCCCGACGTTACTCGGA 1579 GGGTAGTCCGAGTAAC 3348 CTACCC GTCG 7213 7235CCCGACGTTACTCGGAC 1580 GGGGTAGTCCGAGTAA 3349 TACCCC CGTC 7214 7236CCGACGTTACTCGGACT 1581 CGGGGTAGTCCGAGTA 3350 ACCCCG ACGT 7232 7254CCCCGATGCATACACCA 1582 TTCATGTGGTGTATGC 3351 CATGAA ATCG 7233 7255CCCGATGCATACACCAC 1583 TTTCATGTGGTGTATG 3352 ATGAAA CATC 7234 7256CCGATGCATACACCACA 1584 GTTTCATGTGGTGTAT 3353 TGAAAC GCAT 7246 7268CCACATGAAACATCCTA 1585 AGATGATAGGATGTTT 3354 TCATCT CATG 7259 7281CCTATCATCTGTAGGCT 1586 TGAATGAGCCTACAGA 3355 CATTCA TGAT 7327 7349CCTTCGCTTCGAAGCGA 1587 GACTTTTCGCTTCGAA 3356 AAAGTC GCGA 7349 7371CCTAATAGTAGAAGAAC 1588 TGGAGGGTTCTTCTAC 3357 CCTCCA TATT 7365 7387CCCTCCATAAACCTGGA 1589 AGTCACTCCAGGTTTA 3358 GTGACT TGGA 7366 7388CCTCCATAAACCTGGAG 1590 TAGTCACTCCAGGTTT 3359 TGACTA ATGG 7369 7391CCATAAACCTGGAGTGA 1591 ATATAGTCACTCCAGG 3360 CTATAT TTTA 7376 7398CCTGGAGTGACTATATG 1592 GGCATCCATATAGTCA 3361 GATGCC CTCC 7397 7419CCCCCCACCCTACCACA 1593 CGAATGTGTGGTAGGG 3362 CATTCG TGGG 7398 7420CCCCCACCCTACCACAC 1594 TCGAATGTGTGGTAGG 3363 ATTCGA GTGG 7399 7421CCCCACCCTACCACACA 1595 TTCGAATGTGTGGTAG 3364 TTCGAA GGTG 7400 7422CCCACCCTACCACACAT 1596 CTTCGAATGTGTGGTA 3365 TCGAAG GGGT 7401 7423CCACCCTACCACACATT 1597 TCTTCGAATGTGTGGT 3366 CGAAGA AGGG 7404 7426CCCTACCACACATTCGA 1598 GGTTCTTCGAATGTGT 3367 AGAACC GGTA 7405 7427CCTACCACACATTCGAA 1599 GGGTTCTTCGAATGTG 3368 GAACCC TGGT 7409 7431CCACACATTCGAAGAAC 1600 ATACGGGTTCTTCGAA 3369 CCGTAT TGTG 7425 7447CCCGTATACATAAAATC 1601 TGTCTAGATTTTATGT 3370 TAGACA ATAC 7426 7448CCGTATACATAAAATCT 1602 TTGTCTAGATTTTATGT 3371 AGACAA ATA 7466 7488CCCCCCAAAGCTGGTTT 1603 GGCTTGAAACCAGCTT 3372 CAAGCC TGGG 7467 7489CCCCCAAAGCTGGTTTC 1604 TGGCTTGAAACCAGCT 3373 AAGCCA TTGG 7468 7490CCCCAAAGTGGTTTCA 1605 TTGGCTTGAAACCAGC 3374 AGCCAA TTTG 7469 7491CCCAAAGCTGGTTTCAA 1606 GTTGGCTTGAAACCAG 3375 GCCAAC CTTT 7470 7492CCAAAGCTGGTTTCAAG 1607 GGTTGGCTTGAAACCA 3376 CCAACC GCTT 7487 7509CCAACCCCATGGCCTCC 1608 AGTCATGGAGGCCATG 3377 ATGACT GGGT 7491 7513CCCCATGGCCTCCATGA 1609 AAAAAGTCATGGAGG 3378 CTTTTT CCATG 7492 7514CCCATGGCCTCCATGAC 1610 GAAAAAGTCATGGAG 3379 TTTTTC GCCAT 7493 7515CCATGGCCTCCATGACT 1611 TGAAAAAGTCATGGA 3380 TTTTCA GGCCA 7499 7521CCTCCATGACTTTTTCA 1612 CCTTTTTGAAAAAGTC 3381 AAAAGG ATGG 7502 7524CCATGACTTTTTCAAAA 1613 ATACCTTTTTGAAAAA 3382 AGGTAT GTCA 7533 7555CCATTTCATAACTTTGT 1614 ACTTTGACAAAGTTAT 3383 CAAAGT GAAA 7573 7595CCTATATATCTTAATGG 1615 CATGTGCCATTAAGAT 3384 CACATG ATAT 7626 7648CCCCTATCATAGAAGAG 1616 GATAAGCTCTTCTATG 3385 CTTATC ATAG 7627 7649CCCTATCATAGAAGAGC 1617 TGATAAGCTCTTCTAT 3386 TTATCA GATA 7628 7650CCTATCATAGAAGAGCT 1618 GTGATAAGCTCTTCTA 3387 TATCAC TGAT 7650 7672CCTTTCATGATCACGCC 1619 TATGAGGGCGTGATCA 3388 CTCATA TGAA 7665 7687CCCTCATAATCATTTTC 1620 GATAAGGAAAATGATT 3389 CTTATC ATGA 7666 7688CCTCATAATCATTTTCCT 1621 AGATAAGGAAAATGA 3390 TATCT TTATG 7681 7703CCTTATCTGCTTCCTAGT 1622 ACAGGACTAGGAAGC 3391 CCTGT AGATA 7693 7715CCTAGTCCTGTATGCCC 1623 GGAAAAGGGCATACA 3392 TTTTCC GGACT 7699 7721CCTGTATGCCCTTTTCCT 1624 GTGTTAGGAAAAGGG 3393 AACAC CATAC 7707 7729CCCTTTTCCTAACACTC 1625 TGTTGTGAGTGTTAGG 3394 ACAACA AAAA 7708 7730CCTTTTCTAACACTCA 1626 TTGTTGTGAGTGTTAG 3395 CAACAA GAAA 7714 7736CCTAACACTCACAACAA 1627 TTAGTTTTGTTGTGAG 3396 AACTAA TGTT 7773 7795CCGTCTGAACTATCCTG 1628 GGCGGGCAGGATAGTT 3397 CCCGCC CAGA 7786 7808CCTGCCCGCCATCATCC 1629 GGACTAGGATGATGGC 3398 TAGTCC GGGC 7790 7812CCCGCCATCATCCTAGT 1630 ATGAGGACTAGGATG 3399 CCTCAT ATGGC 7791 7813CCGCCATCATCCTAGTC 1631 GATGAGGACTAGGAT 3400 CTCATC GATGG 7794 7816CCATCATCCTAGTCCTC 1632 GGCGATGAGGACTAG 3401 ATCGCC GATGA 7801 7823CCTAGTCCTCATCGCCC 1633 ATGGGAGGGCGATGA 3402 TCCCAT GGACT 7807 7829CCTCATCGCCCTCCCAT 1634 GTAGGGATGGGAGGG 3403 CCCTAC CGATG 7815 7837CCCTCCCATCCCTACGC 1635 AAGGATGCGTAGGGA 3404 ATCCTT TGG 7816 7838CCTCCCATCCCTACGCA 1636 AAAGGATGCGTAGGG 3405 TCCTTT ATGGG 7819 7841CCCATCCCTACGCATCC 1637 TGTAAAGGATGCGTAG 3406 TTTACA GGAT 7820 7842CCATCCCTACGCATCCT 1638 ATGTAAAGGATGCGTA 3407 TTACAT GGGA 7824 7846CCCTACGCATCCTTTAC 1639 TGTTATGTAAAGGATG 3408 ATAACA CGTA 7825 7847CCTACGCATCCTTTACA 1640 CTGTTATGTAAAGGAT 3409 TAACAG GCGT 7834 7856CCTTTACATAACAGACG 1641 TGACCTCGTCTGTTAT 3410 AGGTCA GTAA 7862 7884CCCTCCCTTACCATCAA 1642 ATTGATTTGATGGTAA 3411 ATCAAT GGGA 7863 7885CCTCCCTTACCATCAAA 1643 AATTGATTTGATGGTA 3412 TCAATT AGGG 7866 7888CCCTTACCATCAAATCA 1644 GCCAATTGATTTGATG 3413 ATTGGC GTAA 7867 7889CCTTACCATCAAATCAA 1645 GGCCAATTGATTTGAT 3414 TTGGCC GGTA 7872 7894CCATCAAATCAATTGGC 1646 TTGGTGGCCAATTGAT 3415 CACCAA TTGA 7888 7910CCACCAATGGTACTGAA 1647 CGTAGGTTCAGTACCA 3416 CCTACG TTGG 7891 7913CCAATGGTACTGAACCT 1648 ACTCGTAGGTTCAGTA 3417 ACGAGT CCAT 7905 7927CCTACGAGTACACCGAC 1649 GCCGTAGTCGGTGTAC 3418 TACGGC TCGT 7917 7939CCGACTACGGCGGACTA 1650 GAAGATTAGTCCGCCG 3419 ATCTTC TAGT 7944 7966CCTACATACTTCCCCCA 1651 GAATAATGGGGGAAG 3420 TTATTC TATGT 7955 7977CCCCCATTATTCCTAGA 1652 CCTGGTTCTAGGAATA 3421 ACCAGG ATGG 7956 7978CCCCATTTTCCTAGAA 1653 GCCTGGTTCTAGGAAT 3422 CCAGGC AATG 7957 7979CCCATTATTCCTAGAAC 1654 CGCCTGGTTCTAGGAA 3423 CAGGCG TAAT 7958 7980CCATTATTCCTAGAACC 1655 TCGCCTGGTTCTAGGA 3424 AGGCGA ATAA 7966 7988CCTAGAACCAGGCGACC 1656 GTCGCAGGTCGCCTGG 3425 TGCGAC TTCT 7973 7995CCAGGCGACCTGCGACT 1657 TCAAGGACTCGCAGGT 3426 CCTTGA CGCC 7981 8003CCTGCGACTCCTTGACG 1658 TGTCAACGTCAAGGAG 3427 TTGACA TCGC 7990 8012CCTTGACGTTGACAATC 1659 CTACTCGATTGTCAAC 3428 GAGTAG GTCA 8017 8039CCCGATTGAAGCCCCCA 1660 TACGAATGGGGGCTTC 3429 TTCCTTA AATC 8018 8040CCGATTGAAGCCCCCAT 1661 ATACGAATGGGGGCTT 3430 TCGTAT CAAT 8028 8050CCCCCATTCGTATAATA 1662 TGTAATTATTATACGA 3431 ATTACA ATGG 8029 8051CCCCATTCGTATAATAA 1663 ATGTAATTATTATACG 3432 TTACAT AATG 8030 8052CCCATTCGTATAATAAT 1664 GATGTAATTATTATAC 3433 TACATC GAAT 8031 8053CCATTCGTATAATAATT 1665 TGATGTAATTATTATA 3434 ACATCA CGAA 8080 8102CCCCACATTAGGCTTAA 1666 CTGTTTTTAAGCCTAA 3435 AAACAG TGTG 8081 8103CCCACATTAGGCTTAAA 1667 TCTGTTTTTAAGCCTA 3436 AACAGA ATGT 8082 8104CCACATTAGGCTTAAAA 1668 ATCTGTTTTTAAGCCT 3437 ACAGAT AATG 8111 8133CCCGGACGTCTAAACCA 1669 GTGGTTTGGTTTAGAC 3438 AACCAC GTCC 8112 8134CCGGACGTCTAAACCAA 1670 AGTGGTTTGGTTTAGA 3439 ACCACT CGTC 8125 8147CCAAACCACTTTCACCG 1671 GTGTAGCGGTGAAAGT 3440 CTACAC GGTT 8130 8152CCACTTTCACCGCTACA 1672 CGGTCGTGTAGCGGTG 3441 CGACCG AAAG 8139 8161CCGCTACACGACCGGGG 1673 GTATACCCCCGGTCGT 3442 GTATAC GTAG 8150 8172CCGGGGGTATACTACGG 1674 CATTGACCGTAGTATA 3443 TCAATG CCCC 8194 8216CCACAGTTTCATGCCCA 1675 GGACGATGGGCATGA 3444 TCGTCC AACTG 8207 8229CCCATCGTCCTAGAATT 1676 GGAATTAATTCTAGGA 3445 AATTCC CGAT 8208 8230CCATCGTCCTAGAATTA 1677 GGGAATTAATTCTAG 3446 ATTCCC ACGA 8215 8237CCTAGAATTAATTCCCC 1678 TTTTTAGGGGAATTAA 3447 TAAAAA TTCT 8228 8250CCCCTAAAAATCTTTGA 1679 CCTATTTCAAAGATTT 3448 AATAGG TTAG 8229 8251CCCTAAAAATCTTTGAA 1680 CCCTATTTCAAAGATT 3449 ATAGGG TTTA 8230 8252CCTAAAAATCTTTGAAA 1681 GCCCTATTTCAAAGAT 3450 TAGGGC TTTT 8252 8274CCCGTATTTACCCTATA 1682 GGGTGCTATAGGGTAA 3451 GCACCC ATAC 8253 8275CCGTATTTACCCTATAG 1683 GGGGTGCTATAGGGTA 3452 CACCCC AATA 8262 8284CCCTATAGCACCCCCTC 1684 GGGGTAGAGGGGGTG 3453 TACCCC CTATA 8263 8285CCTATAGCACCCCCTCT 1685 GGGGGTAGAGGGGGT 3454 ACCCCC GCTAT 8272 8294CCCCCTCTACCCCCTCT 1686 GGCTCTAGAGGGGGTA 3455 AGAGCC GAGG 8273 8295CCCCTCTACCCCCTCTA 1687 GGGCTCTAGAGGGGGT 3456 GAGCCC AGAG 8274 8296CCCTCTACCCCCTCTAG 1688 TGGGCTCTAGAGGGGG 3457 AGCCCA TAGA 8275 8297CCTCTACCCCCTCTAGA 1689 GTGGGCTCTAGAGGCG 3458 GCCCAC GTAG 8281 8303CCCCCTCTAGAGCCCAC 1690 TTTACAGTGGGCTCTA 3459 TGTAAA GAGG 8282 8304CCCCTCTAGAGCCCACT 1691 CTTTACAGTGGGCTCT 3460 GTAAAG AGAG 8283 8305CCCTCTAGAGCCCACTG 1692 GCTTTACAGTGGGCTC 3461 TAAAGC TAGA 8284 8306CCTCTAGAGCCCACTGT 1693 AGCTTTACAGTGGGCT 3462 AAAGCT CTAG 8293 8315CCCACTGTAAAGCTAAC 1694 TGCTAAGTTAGCTTTA 3463 TTAGCA CAGT 8294 8316CCACTGTAAAGCTAACT 1695 ATGCTAAGTTAGCTTT 3464 TAGCAT ACAG 8320 8342CCTTTTAAGTTAAAGAT 1696 CTCTTAATCTTTAACTT 3465 TAAGAG AAA 8345 8367CCAACACCTCTTTACAG 1697 ATTTCACTGTAAAGAG 3466 TGAAAT GTGT 8351 8373CCTCTTTACAGTGAAAT 1698 TGGGGCATTTCACTGT 3467 GCCCCA AAAG 8369 8391CCCCAACTAAATACTAC 1699 CATACGGTAGTATTTA 3468 CGTATG GTTG 8370 8392CCCAACTAAATACTACC 1700 CCATACGGTAGTATTT 3469 GTATGG AGTT 8371 8393CCAACTAAATACTACCG 1701 GCCATACGGTAGTATT 3470 TATGGC TAGT 8385 8407CCGTATGGCCCACCATA 1702 GGTAATTATGGTGGGC 3471 ATTACC CATA 8393 8415CCCACCATAATTACCCC 1703 AGTATGGGGGTAATTA 3472 CATACT TGGT 8394 8416CCACCATAATTACCCCC 1704 GAGTATGGGGGTAATT 3473 ATACTC ATGG 8397 8419CCATAATTACCCCCATA 1705 AAGGAGTATGGGGGT 3474 CTCCTT AATTA 8406 8428CCCCCATACTCCTTACA 1706 GAATAGTGTAAGGAGT 3475 CTATTC ATGG 8407 8429CCCCATACTCCTTACAC 1707 GGAATAGTGTAAGGA 3476 TATTCC GTATG 8408 8430CCCATACTCCTTACACT 1708 AGGAATAGTGTAAGG 3477 ATTCCT AGTAT 8409 8431CCATACTCCTTACACTTA 1709 GAGGAATAGTGTAAG 3478 TTCCTC GAGTA 8416 8438CCTTACACTATTCCTCA 1710 GGGTGATGAGGAATA 3479 TCACCC GTGTA 8428 8450CCTCATCACCCAACTAA 1711 ATATTTTTAGTTGGGT 3480 AAATAT GATG 8436 8458CCCAACTAAAAATATTA 1712 TGTGTTTAATATTTTTA 3481 AACACA GTT 8437 8459CCAACTAAAAATATTAA 1713 TTGTGTTTAATATTTTT 3482 ACACAA AGT 8464 8486CCACCTACCTCCCTCAC 1714 GCTTTGGTGAGGGAGG 3483 CAAAGC GAGG 8467 8489CCTACCTCCCTCACCAA 1715 TGGGCTTTGGTGAGGG 3484 AGCCCA AGGT 8471 8493CCTCCCTCACCAAAGCC 1716 TTTATGGGCTTTGGTG 3485 CATAAA AGGG 8474 8496CCCTCACCAAAGCCCAT 1717 ATTTTTATGGGCTTTG 3486 AAAAAT GTGA 8475 8497CCTCACCAAAGCCCATA 1718 TATTTTTATGGGCTTTG 3487 AAAATA GTG 8480 8502CCAAAGCCCATAAAAAT 1719 TTTTTTATTTTTATGG 3488 AAAAAA CTT 8486 8508CCCATAAAAATAAAAA 1720 TTATAATTTTTTATTTT 3489 ATTATAA TAT 8487 8509CCATAAAAATAAAAAA 1721 GTTATAATTTTTTATTT 3490 TTATAAC TTA 8513 8535CCCTGAGAACCAAAATG 1722 TTCGTTCATTTTGGTTC 3491 AACGAA TCA 8514 8536CCTGAGAACCAAAATG 17231 TTTCGTTCATTTTGGTT 3492 AACGAAA CTC 8522 8544CCAAAATGAACGAAAA 1724 GAACAGATTTTCGTTC 3493 TCTGTTC ATTT 8558 8580CCCCCACAATCCTAGGC 1725 GGGTAGGCCTAGGATT 3494 CTACCC GTGG 8559 8581CCCCACAATCCTAGGCC 1726 CGGGTAGGCCTAGGAT 3495 TACCCG TGTG 8560 8582CCCACAATCCTAGGCCT 1727 GCGGGTAGGCCTTAGG 3496 ACCCGC ATTGT 8561 8583CCACAATCCTAGGCCTA 1728 GGCGGGTAGGCCTAG 3497 CCCGCC GATTG 8568 8590CCTAGGCCTACCCGCCG 1729 GTACTGCGGCGGGTAG 3498 CAGTAC GCCT 8574 8596CCTACCCGCCGCAGTAC 1730 TGATCAGTACTGCGGC 3499 TGATCA GGGT 8578 8600CCCGCCGCAGTACTGAT 1731 AGAATGATCAGTACTG 3500 CATTCT CGGC 8579 8601CCGCCGCAGTACTGATC 1732 TAGAATGATCAGTACT 3501 ATTCTA GCGG 8582 8604CCGCAGTACTGATCATT 1733 AAATAGAATGATCAGT 3502 CTATTT ACTG 8605 8627CCCCCTCTATTGATCCC 1734 GAGGTGGGGATCAAT 3503 CACCTC AGAGG 8606 8628CCCCTCTATTGATCCCC 1735 GGAGGTGGGGATCAA 3504 ACCTCC TAGAG 8607 8629CCCTCTATTGATCCCCA 1736 TGGAGGTGGGGATCA 3505 CCTCCA ATAGA 8608 8630CCTCTATTGATCCCCAC 1737 TTGGAGGTGGGGATCA 3506 CTCCAA ATAG 8619 8641CCCCACCTCCAAATATC 1738 TGATGAGATATTTGGA 3507 TCATCA GGTG 8620 8642CCCACCTCCAAATATCT 1739 TTGATGAGATATTTGG 3508 CATCAA AGGT 8621 8643CCACCTCCAAATATCTC 1740 GTTGATGAGATATTTG 3509 ATCAAC GAGG 8624 8646CCTCCAAATATCTCATC 1741 GTTGTTGATGAGATAT 3510 AACAAC TTGG 8627 8649CCAAATATCTCATCAAC 1742 TCGGTTGTTGATGAGA 3511 AACCGA TATT 8646 8668CCGACTAATCACCACCC 1743 ATTGTTGGGTGGTGAT 3512 AACAAT TAGT 8657 8679CCACCCAACAATGACTA 1744 TTTGATTAGTCATTGTT 3513 ATCAAA GGG 8660 8682CCCAACAATGACTAATC 1745 TAGTTTGATTAGTCAT 3514 AAACTA TGTT 8661 8683CCAACAATGACTAATCA 1746 TTAGTTTGATTAGTCA 3515 AACTAA TTGT 8684 8706CCTCAAAACAAATGATA 1747 TATGGTTATCATTTGTT 3516 ACCTA TTG 8702 8724CCATACACAACACTAAA 1748 TCGTCCTTTAGTGTTGT 3517 GGACGA GTA 8726 8748CCTGATCTCTTATACTA 1749 GGATACTAGTATAAGA 3518 GTATCC GATC 8747 8769CCTTAATCATTTTTATTG 1750 TGTGGCAATAAAAATG 3519 CCACA ATTA 8765 8787CCACAACTAACCTCCTC 1751 GAGTCCGAGGAGGTTA 3570 GGACTC GTTG 8775 8797CCTCCTCGGACTCCTGC 1752 AGTGAGGCAGGAGTC 3521 CTCACT CGAGG 8778 8800CCTCGGACTCCTGCCTC 1753 ATGAGTGAGGCAGGA 3522 ACTCAT GTCCG 8787 8809CCTGCCTCACTCATTTA 1754 TTGGTGTAAATGAGTG 3523 CACCAA AGGC 8791 8813CCTCACTCATTTACACC 1755 GTGGTTGGTGTAAATG 3524 AACCAC AGTG 8806 8828CCAACCACCCAACTATC 1756 TTTATAGATAGTTGGG 3525 TATAAA TGGT 8810 8832CCACCCAACTATCTATA 1757 TAGGTTTATAGATAGT 3526 AACCTA TGGG 8813 8835CCCAACTATCTATAAAC 1758 GGCTAGGTTTATAGAT 3527 CTAGCC AGTT 8814 8836GCCAACTATCTATAAACC 1759 TGGCTAGGTTTATAGA 3528 TAGCCA TAGT 8829 8851CCTAGCCATGGCCATCC 1760 ATAAGGGGATGGCCAT 3529 CCTTAT GGCT 8834 8856CCATGGCCATCCCCTTA 1761 CGCTCATAAGGGGATG 3530 TGAGCG GCCA 8840 8862CCATCCCCTTATGAGCG 1762 TGTGCCCGCTCATAAG 3531 GGCACA GGGA 8844 8866CCCCTTATGAGCGGGCA 1763 TCACTGTGCCCGCTCA 3532 CAGTGA TAAG 8845 8867CCCTTATGAGCGGGCAC 1764 ATCACTGTGCCCGCTC 3533 AGTGAT ATAA 8846 8868CCTTATGAGCGGGCACA 1765 AATCACTGTGCCCGCT 3534 GTGATT CATA 8897 8919CCCTAGCCCACTTCTTA 1766 TTGTGGTAAGAAGTGG 3535 CCACAA GCTA 8898 8920CCTAGCCCACTTCTTAC 1767 CTTGTGGTAAGAAGTG 3536 CACAAG GGCT 8903 8925CCCACTTCTTACCACAA 1768 TGTGCCTTGTGGTAAG 3537 GGCACA AAGT 8904 8926CCACTTCTTACCACAAG 1769 GTGTGCCTTGTGGTAA 3538 GCACAC GAAG 8914 8936CCACAAGGCACACCTAC 1770 AGGGGTGTAGGTGTGC 3539 ACCCCT CTTG 8926 8948CCTACACCCCTTATCCC 1771 AGTATGGGGATAAGG 3540 CATACT GGTGT 8932 8954CCCCTTATCCCCATACT 1772 ATAACTAGTATGGGGA 3541 AGTTAT TAAG 8933 8955CCCTTATCCCCATACTA 1773 AATAACTAGTATGGGG 3542 GTTATT ATAA 8934 8956CCTTATCCCCATACTAG 1774 TAATAACTAGTATGGG 3543 TTATTA GATA 8940 8962CCCCATACTAGTTATTA 1775 TTTCGATAATAACTAC 3544 TCGAAA TATG 8941 8963CCCATACTAGTTATTAT 1776 GTTTCGATAATAACTA 3545 CGAAAC GTAT 8942 8964CCATACTAGTTATTATC 1777 GGTTTCGATAATAACT 3546 GAAACC AGTA 8963 8985CCATCAGCCTACTCATT 1778 TGGTTGAATGAGTAGG 3547 CAACCA CTGA 8970 8992CCTACTCATTCAACCAA 1779 GGGCTATTGGTTGAAT 3548 TAGCCC GAGT 8983 9005CCAATAGCCCTGGCCGT 1780 AGGCGTACGGCCAGG 3549 ACGCCT GCTAT 8990 9012CCCTGGCCGTACGCCTA 1781 AGCGGTTAGGCGTACG 3550 ACCGCT GCCA 8991 9013CCTGGCCGTACGCCTAA 1782 TAGCGGTTAGGCGTAC 3551 CCGCTA GGCC 8996 9018CCGTACGCCTAACCGCT 1783 AATGTTAGCGGTTAGG 3552 AACATT CGTA 9003 9025CCTAACCGCTAACATTA 1784 CTGCAGTAATGTTAGC 3553 CTGCAG GGTT 9008 9030CCGCTAACATTACTGCA 1785 GTGGCCTGCAGTAATG 3554 GGCCAC TTAG 9027 9049CCACCTACTCATGCACC 1786 CAATTAGGTGCATGAG 3555 TAATTG TAGG 9030 9052CCTACTCATGCACCTAA 1787 TTCCAATTAGGTGCAT 3556 TTGGAA GAGT 9042 9064CCTAATTGGAAGCGCCA 1788 CTAGGGTGGCGCTTCC 3557 CCCTAG AATT 9056 9078CCACCCTAGCAATATCA 1789 AATGGTTGATATTGCT 3558 ACCATT AGGG 9059 9081CCCTAGCAATATCAACC 1790 GTTAATGGTTGATATT 3559 ATTAAC GCTA 9060 9082CCTAGCAATATCAACCA 1791 GGTTAATGGTTGATAT 3560 TTAACC TGCT 9074 9096CCATTAACCTTCCCTCT 1792 AAGTGTAGAGGGAAG 3561 ACACTT GTTAA 9081 9103CCTTCCCTCTACACTTAT 1793 AGATGATAAGTGTAGA 3562 CATCT GGGA 9085 9107CCCTCTACACTTATCAT 1794 GTGAAGATGATAAGTG 3563 CTTCAC TAGA 9086 9108CCTCTACACTTATCATC 1795 TGTGAAGATGATAAGT 3564 TTCACA GTAG 9129 9151CCTAGAAATCGCTGTCG 1796 TAAGGCGACAGCGAT 3565 CCTTAA TTCT 9146 9168CCTTAATCCAAGCCTAC 1797 GAAAACGTAGGCTTGG 3566 GTTTTC ATTA 9153 9175CCAAGCCTACGTTTTCA 1798 GAAGTGTGAAAACGT 3567 CACTTTC AGGCT 9158 9180CCTACGTTTTCACACTT 1799 TACTAGAAGTGTGAAA 3568 CTAGTA ACGT 9183 9205CCTCTACCTGCACGACA 1800 ATGTGTTGTCGTGCAG 3569 ACACAT GTAG 9189 9211CCTGCACGACAACACAT 1801 GTCATTATGTGTTGTC 3570 AATGAC GTGC 9211 9233CCCACCAATCACATGCC 1802 ATGATAGGCATGTGAT 3571 TATCAT TGGT 9212 9234CCACCAATCACATGCCT 1803 TATGATAGGCATGTGA 3572 ATCATA TTGG 9215 9237CCAATCACATGCCTATC 1804 CTATATGATAGGCATG 3573 ATATAG TGAT 9226 9248CCTATCATATAGTAAAA 1805 GCTGGGTTTTACTATA 3574 CCCAGC TGAT 9243 9265CCCAGCCCATGACCCCT 1806 CCTGTTAGGGGTCATG 3575 AACAGG GGCT 9244 9266CCAGCCCATGACCCCTA 1807 CCCTGTTAGGGGTCAT 3576 ACAGGG GGGC 9248 9270CCCATGACCCCTAACAG 1808 GGGCCCCTGTTAGGGG 3577 GGGCCC TCAT 9249 9271CCATGACCCCTAACAGG 1809 AGGGCCCCTGTTAGGG 3578 GGCCCT GTCA 9255 9277CCCCTAACAGGGGCCCT 1810 GCTGAGAGGGCCCCTG 3579 CTCAGC TTAG 9256 9278CCCTAACAGGGGCCCTC 1811 GGCTGAGAGGGCCCCT 3580 TCAGCC GTTA 9257 9279CCTAACAGGGGCCCTCT 1812 GGGCTGAGAGGGCCC 3581 CAGCCC CTGTT 9268 9290CCCTCTCAGCCCTCCTA 1813 GGTCATTAGGAGGGCT 3582 ATGACC GAGA 9269 9291CCTCTCAGCCCTCCTAA 1814 AGGTCATTAGGAGGGC 3583 TGACCT TGAG 9277 9299CCCTCCTAATGACCTCC 1815 TAGGCCGGAGGTCATT 3584 GGCCTA AGGA 9278 9300CCTCCTAATGACCTCCG 1816 CTAGGCCGGAGGTCAT 3585 GCCTAG TAGG 9281 9303CCTAATGACCTCCGGCC 1817 TGGCTAGGCCGGAGGT 3586 TAGCCA CATT 9289 9311CCTCCGGCCTAGCCATG 1818 AAATCACATGGCTAGG 3587 TGATTT CCGG 9292 9314CCGGCCTAGCCATGTGA 1819 GTGAAATCACATGGCT 3588 TTTCAC AGGC 9296 9318CCTAGCCATGTGATTTC 1820 GGAAGTGAAATCACAT 3589 ACTTCC GGCT 9301 9323CCATGTGATTTCACTTC 1821 GGAGTGGAAGTGAAA 3590 CACTCC TCACA 9317 9339CCACTCCATAACGCTCC 1822 GTATGAGGAGCGTTAT 3591 TCATAC GGAG 9322 9344CCATAACGCTCCTCATA 1823 GCCTAGTATGAGGAGC 3592 CTAGGC GTTA 9332 9354CCTCATACTAGGCCTAC 1824 TGGTTAGTAGGCCTAG 3593 TAACCA TATG 9344 9366CCTACTAACCAACACAC 1825 TGGTTAGTGTGTTGGT 3594 TAACCA TAGT 9352 9374CCAACACACTAACCATA 1826 TTGGTATATGGTFAGT 3595 TACCAA GTGT 9364 9386CCATATACCAATGATGG 1827 ATCGCGCCATCATTGG 3596 CGCGAT TATA 9371 9393CCAATGATGGCGCGATG 1828 GTGTTACATCGCGCCA 3597 TAACAC TCAT 9407 9429CCAAGGCCACCACACAC 1829 CAGGTGGTGTGTGGTG 3598 CACCTG GCCT 9413 9435CCACCACACACCACCTG 1830 TTTGGACAGGTGGTGT 3599 TCCAAA GTGG 9416 9438CCACACACCACCTGTCC 1831 CTTTTTGGACAGGTGG 3600 AAAAAG TGTG 9423 9445CCACCTGTCCAAAAAGG 1832 CGAAGGCCTTTTTGGA 3601 CCTTCG CAGG 9426 9448CCTGTCCAAAAAGGCCT 1833 TATCGAAGGCCTTTTT 3602 TCGATA GGAC 9431 9453CCAAAAAGGCCTTCGAT 1834 TCCCGTATCGAAGGCC 3603 ACGGGA TTTT 9440 9462CCTTCGATACGGGATAA 1835 ATAGGATTATCCCGTA 3604 TCCTAT TCGA 9458 9480CCTATTTATTACCTCAG 1836 AAACTTCTGAGGTAAT 3605 AAGTTT AAAT 9469 9491CCTCAGAAGTTTTTTTCT 1837 TGCGAAGAAAAAAAC 3606 TCGCA TTCTG 9505 9527CCTTTTACCACTCCAGC 1838 GGCTAGGCTGGAGTGG 3607 CTAGCC TAAA 9512 9534CCACTCCAGCCTAGCCC 1839 GGGTAGGGGCTAGGCT 3608 CTACCC GGAG 9517 9539CCAGCCTAGCCCCTACC 1840 TTGGGGGGTAGGGGCT 3609 CCCCAA AGGC 9521 9543CCTAGCCCCTACCCCCC 1841 CTAATTGGGGGGTAGG 3610 AATTAG GGCT 9526 9548CCCCTACCCCCCAATTA 1842 CCCTCCTAATTGGGGG 3611 GGAGGG GTAG 9527 9549CCCTACCCCCCAATTAG 1843 GCCCTCCTAATTGGGG 3612 GAGGGC GGTA 9528 9550CCTACCCCCCAATTAGG 1844 TGCCCTCCTAATTGGG 3613 AGGGCA GGGT 9532 9554CCCCCCAATTAGGAGGG 1845 CCAGTGCCCTCCTAAT 3614 CACTGG TGGG 9533 9555CCCCCAATTAGGAGGGC 1846 GCCAGTGCCCTCCTAA 3615 ACTGGC TTGG 9534 9556CCCCAATTAGGAGGGCA 1847 GGCCAGTGCCCTCCTA 3616 CTGGCC ATTG 9535 9557CCCAATTAGGAGGGCAC 1848 GGGCCAGTGCCCTCCT 3617 TGGCCC AATT 9536 9558CCAATTAGGAGGGCACT 1849 GGGGCCAGTGCCCTCC 3618 GGCCCC TAAT 9555 9577CCCCCAACAGGCATCAC 1850 AGCGGGGTGATGCCTG 3619 CCCGCT TTGG 9556 9578CCCCAACAGGCATCACC 1851 TAGCGGGGTGATGCCT 3620 CCGCTA GTTG 9557 9579CCCAACAGGCATCACCC 1852 TTAGCGGGGTGATGCC 3621 CGCTAA TGTT 9558 9580CCAACAGGCATCACCCC 1853 TTTAGCGGGGTGATGC 3622 GCTAAA CTGT 9571 9593CCCCGCTAAATCCCCTA 1854 GACTTCTAGGGGATTT 3623 GAAGTC AGCG 9572 9594CCCGCTAAATCCCCTAG 1855 GGACTTCTAGGGGATT 3624 AAGTCC TAGC 9573 9595CCGCTAAATCCCCTAGA 1856 GGGACTTCTAGGGGAT 3625 AGTCCC TTAG 9582 9604CCCCTAGAAGTCCCACT 1857 TTTAGGAGTGGGACTT 3626 CCTAAA CTAG 9583 9605CCCTAGAAGTCCCACTC 1858 GTTTAGGAGTGGGACT 3627 CTAAAC TCTA 9584 9606CCTAGAAGTCCCACTCC 1859 TGTTTAGGAGTGGGAC 3628 TAAACA TTCT 9593 9615CCCACTCCTAAACACAT 1860 ATACGGATGTGTTTAG 3629 CCGTAT GAGT 9594 9616CCACTCCTAAACACATC 1861 AATACGGATGTGTTTA 3630 CGTATT GGAG 9599 9621CCTAAACACATCCGTAT 1862 CGAGTAATACGGATGT 3631 TACTCG GTTT 9610 9632CCGTATTACTCGCATCA 1863 TACTCCTGATGCGAGT 3632 GGAGTA AATA 9640 9662CCTGAGCTCACCATAGT 1864 TATTAGACTATGGTGA 3633 CTAATA GCTC 9650 9672CCATAGTCTAATAGAAA 1865 GGTTGTTTTCTATTAG 3634 ACAACC ACTA 9671 9693CCGAAACCAAATAATTC 1866 GTGCTTGAATTATTTG 3635 AAGCAC GTTT 9677 9699CCAAATAATTCAAGCAC 1867 TAAGCAGTGCTTGAAT 3636 TGCTTA TATT 9727 9749CCCTCCTACAAGCCTCA 1868 GTACTCTGAGGCTTGT 3637 GAGTAC AGGA 9728 9750CCTCCTACAAGCCTCAG 1869 AGTACTCTGAGGCTTG 3638 AGTACT TAGG 9731 9753CCTACAAGCCTCAGAGT 1870 CGAAGTACTCTGAGGC 3639 ACTTCG TTGT 9739 9761CCTCAGAGTACTTCGAG 1871 CGAAGTACTCTGAGGC 3640 TCTCCC CTCTG 9759 9781CCCTTCACCATTTCCGA 1872 ATGCCGTCGGAAATGG 3641 CGGCAT TGAA 9760 9782CCTTCACCATTTCCGAC 1873 GATGCCGTCGGAAATG 3642 GGCATC GTGA 9766 9788CCATTTCCGACGGCATC 1874 GCCGTAGATGCCGTCG 3643 TACGGC GAAA 9772 9794CCGACGGCATCTACGGC 1875 TGTTGAGCCGTAGATG 3644 TCAACA CCGT 9805 9827CCACAGGCTTCCACGGA 1876 GTGAAGTCCGTGGAAG 3645 CTTCAC CCTG 9815 9837CCACGGACTTCACGTCA 1877 CAATAATGACGTGAAG 3646 TTATTG TCCG 9848 9870CCTCACTATCTGCTTCA 1878 GGCGGATGAAGCAGA 3647 TCCGCC TAGTG 9866 9888CCGCCAACTAATATTTC 1879 TAAAGTGAAATATTAG 3648 ACTTTA TTGG 9869 9891CCAACTAATATTTCACT 1880 ATGTAAAGTGAAATAT 3649 TTACAT TAGT 9892 9914CCAAACATCACTTTGGC 1881 TTCGAAGCCAAAGTGA 3650 TTCGAA TGTT 9916 9938CCGCCGCCTGATACTGG 1882 AAAATGCCAGTATCAG 3651 CATTTT GCGG 9919 9941CCGCCTGATACTGGCAT 1883 TACAAAATGCCAGTAT 3652 TTTGTA CAGG 9922 9944CCTGATACTGGCATTTT 1884 ATCTACAAAATGCCAG 3653 GTAGAT TATC 9970 9992CCATCTATTGATGAGGG 1885 GTAAGACCCTCATCAA 3654 TCTTAC TAGA 10012 10034CCGTTAACTTCCAATTA 1886 ACTAGTTAATTGGAG 3655 ACTAGT TTAA 10022 10044CCAATTAACTAGTTTTG 1887 TGTTGTCAAAACTAGT 3656 ACAACA TAAT 10069 10091CCTTAATTTTAATAATC 1888 GGTGTTGATTATTAAA 3657 AACACC ATTA 10090 10112CCCTCCTAGCCTTACTA 1889 TATTAGTAGTAAGGCT 3658 CTAATA AGGA 10091 10113CCTCCTAGCCTTACTAC 1890 TTATTAGTAGTAAGGC 3659 TAATAA TAGG 10094 10116CCTAGCCTTACTACTAA 1891 TAATTATTAGTAGTAA 3660 TAATTA GGCT 10099 10121CCTTACTACTAATAATT 1892 TGTAATAATTATTAGT 3661 ATTACA AGTA 10131 10153CCACAACTCAACGGCT 1893 TCTATGTAGCCGTTGA 3662 CATAGA GTTG 10159 10181CCACCCCTTACGAGTGC 1894 GAAGCCGCACTCGTAA 3663 GGCTTC GGGG 10162 10184CCCCTTACGAGTGCGGC 1895 GTCGAAGCCGCACTCG 3664 TTCGAC TAAG 10163 10185CCCTTACGAGTGCGGCT 1896 GGTCGAAGCCGCACTC 3665 TCGACC GTAA 10164 10186CCTTACGAGTGCGGCTT 1897 GGGTCGAAGCCGCACT 3666 CGACCC CGTA 10184 10206CCCTATATCCCCCGCCC 1898 GGACGCGGGCGGGGG 3667 GCGTCC ATATA 10185 10207CCTATATCCCCCGCCCG 1899 GGGACGCGGGCGGGG 3668 CGTCCC GATAT 10192 10214CCCCCGCCCGCGTCCCT 1900 GGAGAAAGGGACGCG 3669 TTCTCC GGCGG 10193 10215CCCCGCCCGCGTCCCTT 1901 TGGAGAAAGGGACGC 3670 TCTCCA GGGCG 10194 10216CCCGCCCGCGTCCCTTT 1902 ATGGAGAAAGGGACG 3671 CTCCAT CGGGC 10195 10217CCGCCCGCGTCCCTTTC 1903 TATGGAGAAAGGGAC 3672 TCCATA GCGGG 10198 10220CCCGCGTCCCTTTCTCC 1904 TTTTATGGAGAAAGGG 3673 ATAAAA ACGC 10199 10221CCGCGTCCCTTTCTCCA 1905 ATTTTATGGAGAAAGG 3674 TAAAAT GACG 10205 10227CCCTTTCTCCATAAAAT 1906 AGAAGAATTTTATGGA 3675 TCTTCT GAAA 10206 10228CCTTTCTCCATAAAATT 1907 AAGAAGAATTTTATGG 3676 CTTCTT AGAA 10213 10235CCATAAAATTCTTCTTA 1908 AGCTACTAAGAAGAAT 3677 GTAGCT TTTA 10240 10262CCTTCTTATTATTTGATC 1909 TTCTAGATCAAATAAT 3678 TAGAA AAGA 10267 10289CCCTCCTTTTACCCCTAC 1910 TCATGGTAGGGGTAAA 3679 CATGA AGGA 10268 10290CCTCCTTTTACCCCTACC 1911 CTCATGGTAGGGGTAA 3680 ATGAG AAGG 10271 10293CCTTTTACCCCTACCAT 1912 GGGCTCATGGTAGGGG 3681 GAGCCC TAAA 10278 10300CCCCTACCATGAGCCCT 1913 GTTTGTAGGGCTCATG 3682 ACAAAC GTAG 10279 10301CCCTACCATGAGCCCTA 1914 TGTTTGTAGGGCTCAT 3683 CAAACA GGTA 10280 10302CCTACCATGAGCCCTAC 1915 TTGTTTGTAGGGCTCA 3684 AAACAA TGGT 10284 10306CCATGAGCCCTACAAAC 1916 TTAGTTGTTTGTAGGG 3685 AACTAA CTCA 10291 10313CCCTACAAACAACTAAC 1917 TGGCAGGTTAGTTGTT 3686 CTGCCA TGTA 10292 10314CCTACAAACAACTAACC 1918 GTGGCAGGTTAGTTGT 3687 TGCCAC TTGT 10307 10329CCTGCCACTAATAGTTA 1919 ATGACATAACTATTAG 3688 TGTCAT TGGC 10311 10333CCACTAATAGTTATGTC 1920 AGGGATGACATAACTA 3689 ATCCCT TTAG 10330 10352CCCTCTTATTAATCATC 1921 TAGGATGATGATTAAT 3690 ATCCTA AAGA 10331 10353CCTCTTATTAATCATCA 1922 CTAGGATGATGATTAA 3691 TCCTAG TAAG 10349 10371CCTAGCCCTAAGTCTGG 1923 CATAGCCAGACTTAG 3692 CCTATG GGCT 10354 10376CCCTAAGTCTGGCCTAT 1924 TCACTCATAGGCCAGA 3693 GAGTGA CTTA 10355 10377CCTAAGTCTGGCCTATG 1925 GTCACTCATAGGCCAG 3694 AGTGAC ACTT 10366 10388CCTATGAGTGACTACAA 1926 TCCTTTTTGTAGTCACT 3695 AAAGGA CAT 10399 10421CCGAATTGGTATATAGT 1927 GTTTAAACTATATACC 3696 TTAAAC AATT 10466 10488CCAAATGCCCCTCATTT 1928 TTATGTAAATGAGGGG 3697 ACATAA CATT 10473 10495CCCCTCATTTACATAAA 1929 ATAATATTTATGTAAA 3698 TATTAT TGAG 10474 10496CCCTCATTTACATAAAT 1930 TATAATATTTATGTAA 3699 ATTATA ATGA 10475 10497CCTCATTTACATAAATA 1931 GTATAATATTTATGTA 3700 TTATAC AATG 10507 10529CCATCTCACTTCTAGGA 1932 TAGTATTCCTAGAAGT 3701 ATACTA GAGA 10544 10566CCTCATATCCTCCCTAC 1933 GGCATAGTAGGGAGG 3702 TATGCC ATATG 10552 10574CCTCCCTACTATGCCTA 1934 TCCTTCTAGGCATAGT 3703 GAAGGA AGGG 10555 10577CCCTACTATGCCTAGAA 1935 TATTCCTTCTAGGCAT 3704 GGAATA AGTA 10556 10578CCTACTATGCCTAGAAG 1936 TTATTCCTTCTAGCTCA 3705 GAATAA TAGT 10565 10587CCTAGAAGGAATAATAC 1937 GCGATAGTATTATTCC 3706 TATCGC TTCT 10612 10634CCCTCAACACCCACTCC 1938 TAAGAGGGAGTGGGT 3707 CTCTTA GTTGA 10613 10635CCTCAACACCCACTCCC 1939 CTAAGAGGGAGTGGG 3708 TCTTAG TGTTG 10621 10643CCCACTCCCTCTTAGCC 1940 AATATTGGCTAAGAGG 3709 AATATT GAGT 10622 10644CCACTCCCTCTTAGCCA 1941 CAATATTGGCTAAGAG 3710 ATATTG GGAG 10627 10649CCCTCTTAGCCAATATT 1942 AGGCACAATATTGGCT 3711 GTGCCT AAGA 10628 10650CCTCTTAGCCAATATTG 1943 TAGGCACAATATTGGC 3712 TGCCTA TAAG 10636 10658CCAATATTGTGCCTATT 1944 TATGGCAATAGGCACA 3713 GCCATA ATAT 10647 10669CCTATTGCCATACTAGT 1945 GCAAAGACTAGTATGG 3714 CTTTGC CAAT 10654 10676CCATACTAGTCTTTGCC 1946 GCAGGCGGCAAAGAC 3715 GCCTGC TAGTA 10669 10691CCGCCTGCGAAGCAGCG 1947 GCCCACCGCTGCTTCG 3716 GTGGGC CAGG 10672 10694CCTGCGAAGCAGCGGTG 1948 TAGGCCCACCGCTGCT 3717 GGCCTA TCGC 10691 10713CCTAGCCCTACTAGTCT 1949 AGATTGAGACTAGTAG 3718 CAATCT GGCT 10696 10718CCCTACTAGTCTCAATC 1950 GTTGGAGATTGAGACT 3719 TCCAAC AGTA 10697 10719CCTACTAGTCTCAATCT 1951 TGTTGGAGATTGAGAC 3720 CCAACA TAGT 10714 10736CCAACACATATGGCCTA 1952 GTAGTCTAGGCCATAT 3721 GACTAC GTGT 10727 10749CCTAGACTACGTACATA 1953 TTAGGTTATGTACGTA 3722 ACCTAA GTCT 10745 10767CCTAAACCTACTCCAAT 1954 TTTAGCATTGGAGTAG 3723 GCTAAA GTTT 10751 10773CCTACTCCAATGCTAAA 1955 ATTAGTTTTAGCATTG 3724 ACTAAT GAGT 10757 10779CCAATGCTAAAACTAAT 1956 GGGACGATTAGTTTTA 3725 CGTCCC GCAT 10777 10799CCCAACAATTATATTAC 1957 GTGGTAGTAATATAAT 3726 TACCAC TGTT 10778 10800CCAACAATTATATTACT 1958 AGTGGTAGTAATATAA 3727 ACCACT TTGT 10796 10818CCACTGACATGACTTTC 1959 TTTTTGGAAAGTCATG 3728 CAAAAA TCAG 10812 10834CCAAAAAACACATAATT 1960 GATTCAAATTATGTGT 3729 TGAATC TTTT 10842 10864CCACCCACAGCCTAATT 1961 GCTAATAATTAGGCTG 3730 ATTAGC TGGG 10845 10867CCCACAGCCTAATTATT 1962 GATGCTAATAATTAGG 3731 AGCATC CTGT 10846 10868CCACAGCCTAATTATTA 1963 TGATGCTAATAATTAG 3732 GCATCA GCTG 10852 10874CCTAATTATTAGCATCA 1964 GAGGGATGATGCTAAT 3733 TCCCTC AATT 10870 10892CCCTCTACTATTTTTTAA 1965 TTTGGTTAAAAAATAG 3734 CCAAA TAGA 10871 10893CCTCTACTATTTTTTAAC 1966 ATTTGGTTAAAAAATA 3735 CAAAT GTAG 10888 10910CCAAATCAACAACAACC 1967 TAAATAGGTTGTTGTT 3736 TATTTA GATT 10903 10925CCTATTTAGCTGTTCCC 1968 AGGTTGGGGAACAGCT 3737 CAACCT AAAT 10917 10939CCCCAACCTTTTCCTCC 1969 GGGGTCGGAGGAAAA 3738 GACCCC GGTTG 10918 10940CCCAACCTTTTCCTCCG 1970 GGGGGTCGGAGGAAA 3739 ACCCCC AGGTT 10919 10941CCAACCTTTTCCTCCGA 1971 AGGGGGTCGGAGGAA 3740 CCCCCT AAGGT 10923 10945CCTTTTCCTCCGACCCC 1972 TGTTAGGGGGTCGGAG 3741 CTAACA GAAA 10929 10951CCTCCGACCCCCTAACA 1973 GGGGGTTGTTAGGGGG 3742 ACCCCC TCGG 10932 10954CCGACCCCCTAACAACC 1974 GAGGGGGGTTGTTAGG 3743 CCCCTC GGGT 10936 10958CCCCCTAACAACCCCCC 1975 TTAGGAGGGGGGTTGT 3744 TCCTAA TAGG 10937 10959CCCCTAACAACCCCCCT 1976 ATTAGGAGGGGGGTTG 3745 CCTAAT TTAG 10938 10960CCCTAACAACCCCCCTC 1977 TATTAGGAGGGGGGTT 3746 CTAATA GTTA 10939 10961CCTAACAACCCCCCTCC 1978 GTATTAGGAGGGGGGT 3747 TAATAC TGTT 10947 10969CCCCCCTCCTAATACTA 1979 GGTAGTTAGTATTAGG 3748 ACTACC AGGG 10948 10970CCCCCTCCTAATACTAA 1980 AGGTAGTTAGTATTAG 3749 CTACCT GAGG 10949 10971CCCCTCCTAATACTAAC 1981 CAGGTAGTTAGTATTA 3750 TACCTG GGAG 10950 10972CCCTCCTAATACTAAC 1982 TCAGGTAGTTAGTATT 3751 ACCTGA AGGA 10951 10973CCTCCTAATACTAACTA 1983 GTCAGGTAGTTAGTAT 3752 CCTGAC TAGG 10954 10976CCTAATACTAACTACCT 1984 GGAGTCAGGTAGTTAG 3753 GACTCC TATT 10968 10990CCTGACTCCTACCCCTC 1985 GATTGTGAGGGGTAGG 3754 ACAATC AGTC 10975 10997CCTACCCCTCACAATCA 1986 TTGCCATGATTGTGAG 3755 TGGCAA GGGT 10979 11001CCCCTCACAATCATGGC 1987 TGGCTTGCCATGATTG 3756 AAGCCA TGAG 10980 11002CCCTCACAATCATGGCA 1988 TTGGCTTGCCATGATT 3757 AGCCAA GTGA 10981 11003CCTCACAATCATGGCAA 1989 GTTGGCTTGCCATGAT 3758 GCCAAC GTG 10999 11021CCAACGCCACTTATCCA 1990 GTTCACTGGATAAGTG 3759 GTGAAC GCGT 11005 11027CCACTTATCCAGTGAAC 1991 ATAGTGGTTCACTGGA 3760 CACTAT TAAG 11013 11035CCAGTGAACCACTATCA 1992 TTTTCGTGATAGTGGT 3761 CGAAAA TCAC 11021 11043CCACTATCACGAAAAAA 1993 TAGAGTTTTTTTCGTG 3762 ACTCTA ATAG 11044 11066CCTCTCTATACTAATCT 1994 GTAGGGAGATTAGTAT 3763 CCCTAC AGAG 11061 11083CCCTACAAATCTCCTTA 1995 TATAATTAAGGAGATT 3764 ATTATA TGTA 11062 11084CCTACAAATCTCCTTAA 1996 TTATAATTAAGGAGAT 3765 TTATAA TTGT 11073 11095CCTTAATTATAACATTC 1997 GGCTGTGAATGTTATA 3766 ACAGCC ATTA 11094 11116CCACAGAACTAATCATA 1998 ATAAAATATGATTAGT 3767 TTTTAT TCTG 11130 11152CCACACTTATCCCCACC 1999 AGCCAAGGTGGGGAT 3768 TTGGCT AAGTG 11140 11162CCCCACCTTGGCTATCA 2000 GGGTGATGATAGCCAA 3769 TCACCC GGTG 11141 11163CCCACCTTGGCTATCAT 2001 CGGGTGATGATAGCCA 3770 CACCCG AGGT 11142 11164CCACCTTGGCTATCATC 2002 TCGGGTGATGATAGCC 3771 ACCCGA AAGG 11145 11167CCTTGGCTATCATCACC 2003 TCATCGGGTGATGATA 3772 CGATGA GCCA 11160 11182CCCGATGAGGCAACCA 2004 TTCTGGCTGGTTGCCT 3773 GCCAGAA CATC 11161 11183CCGATGAGGCAACCAG 2005 GTTCTGGCTGGTTGCC 3774 CCAGAAC TCAT 11173 11195CCAGCCAGAACGCCTGA 2006 CTGCGTTCAGGCGTTC 3775 ACGCAG TGGC 11177 11199CCAGAACGCCTGAACGC 2007 GTGCCTGCGTTCAGGC 3776 AGGCAC GTTC 11185 11207CCTGAACGCAGGCACAT 2008 GGAAGTATGTGCCTGC 3777 ACTTCC GTTC 11206 11228CCTATTCTACACCCTAG 2009 AGCCTACTAGGGTGTA 3778 TAGGCT GAAT 11217 11239CCCTAGTAGGCTCCCTT 2010 TAGGGGAAGGGAGCC 3779 CCCCTA TACTA 11218 11240CCTAGTAGGCTCCCTTC 2011 GTAGGGGAAGGGAGC 3780 CCCTAC CTACT 11229 11251CCCTTCCCCTACTCATC 2012 TAGTGCGATGAGTAGG 3781 GCACTA GGAA 11230 11252CCTTCCCCTACTCATCG 2013 TTAGTGCGATGAGTAG 3782 CACTAA GGGA 11234 11256CCCCTACTCATCGCACT 2014 TAAATTAGTGCGATGA 3783 AATTTA GTAG 11235 11257CCCTACTCATCGCACTA 2015 GTAAATTAGTGCGATG 3784 ATTTAC AGTA 11236 11258CCTACTCATCGCACTAA 2016 TGTAAATTAGTGCGAT 3785 TTTACA GAGT 11268 11290CCCTAGGCTCACTAAAC 2017 TAGAATGTTTAGTGAG 3786 ATTCTA CCTA 11269 11291CCTAGGCTCACTAAACA 2018 GTAGAATGTTTAGTGA 3787 TTCTAC GCCT 11307 11329CCCAAGAACTATCAAAC 2019 TCAGGAGTTTGATAGT 3788 TCCTGA TCTT 11308 11330CCAAGAACTATCAAACT 2020 CTCAGGAGTTTGATAG 3789 CCTGAG TTCT 11325 11347CCTGAGCCAACAACTTA 2021 TCATATTAAGTTGTTG 3790 ATATGA GCTC 11331 11353CCAACAACTTAATATGA 2022 AGCTAGTCATATTAAG 3791 CTAGCT TTGT 11381 11403CCTCTTTACGGACTCCA 2023 CATAAGTGGAGTCCGT 3792 CTTTATG AAAG 11395 11417CCACTTATGACTCCCTA 2024 GGGCTTTAGGGAGTCA 3793 AAGCCC TAAG 11407 11429CCCTAAAGCCCATGTCG 2025 GGGCTTCGACATGGGC 3794 AAGCCC TTTA 11408 11430CCTAAAGCCCATGTCGA 2026 GGGGCTTCGACATGGG 3795 AGCCCC CTTT 11415 11437CCCATGTCGAAGCCCCC 2027 AGCGATGGGGGCTTCG 3796 ATCGCT ACAT 11416 11438CCATGTCGAAGCCCCCA 2028 CAGCGATGGGGGCTTC 3797 TCGCTG GACA 11427 11449CCCCCATCGCTGGGTCA 2029 TACTATTGACCCAGCG 3798 ATAGTA ATGG 11428 11450CCCCATCGCTGGGTCAA 2030 GTACTATTGACCCAGC 3799 TAGTAC GATG 11429 11451CCCATCGCTGGGTCAAT 2031 AGTACTATTGACCCAG 3800 AGTACT CGAT 11430 11452CCATCGCTGGGTCAATA 2032 AAGTACTATTGACCCA 3801 GTACTT GCGA 11454 11476CCGCAGTACTCTTAAAA 2033 GCCTAGTTTTAAGAGT 3802 CTAGGC ACTG 11494 11516CCTCACACTCATTCTCA 2034 GGGGGTTGAGAATGA 3803 ACCCCC GTGTG 11512 11534CCCCCTGACAAAACACA 2035 AGGCTATGTGTTTTGT 3804 TAGCCT CAGG 11513 11535CCCCTGACAAAACACAT 2036 TAGGCTATGTGTTTTG 3805 AGCCTA TCAG 11514 11536CCCTGACAAAACACATA 2037 GTAGGCTATGTGTTTT 3806 GCCTAC GTCA 11515 11537CCTGACAAAACACATAG 2038 GGTAGGCTATGTGTTT 3807 CCTACC TGTC 11532 11554CCTACCCCTTTCCTGTA 2039 GGATAGTACAAGGAA 3808 CTATCC GGGGT 11536 11558CCCCTTCCTTGTACTATC 2040 ATAGGGATAGTACAA 3809 CCTAT GGAAG 11537 11559CCCTTCCTTGTACTATCC 2041 CATAGGGATAGTACAA 3810 CTATG GGAA 11538 11560CCTTCCTTGTACTATCCC 2042 TCATAGGGATAGTACA 3811 TATGA AGGA 11542 11564CCTTGTACTATCCCTAT 2043 TGCCTCATAGGGATAG 3812 GAGGCA TACA 11553 11575CCCTATGAGGCATAATT 2044 TGTTATAATTATGCCT 3813 ATAACA CATA 11554 11576CCTATGAGGCATAATTA 2045 TTGTTATAATTATGCC 3814 TAACAA TCAT 11580 11602CCATCTGCCTACGACAA 2046 GTCTGTTTGTCGTAGG 3815 ACAGAC CAGA 11587 11609CCTACGACAAACAGACC 2047 ATTTTAGGTCTGTTTGT 3816 TAAAAT CGT 11602 11624CCTAAAATCGCTCATTG 2048 AGTATGCAATGAGCGA 3817 CATACT TTTT 11635 11657CCACATAGCCCTCGTAG 2049 CTGTTACTACGAGGGC 3818 TAACAG TATG 11643 11665CCCTCGTAGTAACAGCC 2050 GAGAATGGCTGTTACT 3819 ATTCTC ACGA 11644 11666CCTCGTAGTAACAGCCA 2051 TGAGAATGGCTGTTAC 3820 TTCTCA TACG 11658 11680CCATTCTCATCCAAACC 2052 TCAGGGGGTTTGGATG 3821 CCCTGA AGAA 11668 11690CCAAACCCCCTGAAGCT 2053 CGGTGAAGCTTCAGGG 3822 TCACCG GGTT 116173 11695CCCCCTGAAGCTTCACC 2054 TGCGCCGGTGAAGCTT 3823 GGCGCA CAGG 11674 11696CCCCTGAAGCTTCACCG 2055 CTGCGCCGGTGAAGCT 3824 GCGCAG TCAG 11675 11697CCCTGAAGCTTCACCGG 2056 ACTGCGCCGGTGAAGC 3825 CGCAGT TTCA 11676 11698CCTGAAGCTTCACCGGC 2057 GACTGCGCCGGTGAAG 3826 GCAGTC CTTC 11688 11710CCGGCGCAGTCATTCTC 2058 GATTATGAGAATGACT 3827 ATAATC GCGC 11712 11734CCCACGGGCTTACATCC 2059 TAATGAGGATGTAAGC 3828 TCATTA CCGT 11713 11735CCACGGGCTTACATCCT 2060 GTAATGAGGATGTAAG 3829 CATTAC CCCG 11727 11749CCTCATTACTATTCTGC 2061 TGCTAGGCAGAATAGT 3830 CTAGCA AATG 11743 11765CCTAGCAAACTCAAACT 2062 GTTCGTAGTTTGAGTT 3831 ACGAAC TGCT 11788 11810CCTCTCTCAAGGACTTC 2063 GAGTTTGAAGTCCTTG 3832 AAACTC AGAG 11815 11837CCCACTAATAGCTTTTT 2064 GTCATCAAAAAGCTAT 3833 GATGAC TAGT 11816 11838CCACTAATAGCTTTTTG 2065 AGTCATCAAAAAGCTA 3834 ATGACT TTAG 11870 11848CCTCGCTAACCTCGCCT 2066 GGGGTAAGGCGAGGT 3835 TACCCC TAGCG 11857 11879CCTCGCCTTACCCCCCA 2067 TAATAGTGGGGGGTAA 3836 CTATTA GGCG 11862 11884CCTTACCCCCCACTATT 2068 TAGGTTAATAGTGGGG 3837 AACCTA GGTA 11867 11889CCCCCCACTATTAACCT 2069 CCCAGTAGGTTAATAG 3838 ACTGGG TGGG 11868 11890CCCCCACTATTAACCTA 2070 TCCCAGTAGGTTAATA 3839 CTGGGA GTGG 11869 11891CCCCACTATTAACCTAC 2071 CTCCCAGTAGGTTAAT 3840 TGGGAG AGTG 11870 11892CCCACTATTAACCTACT 2072 TCTCCCAGTAGGTTAA 3841 GGGAGA TAGT 11871 11893CCACTATTAACCTACTG 2073 TTCTCCCAGTAGGTTA 3842 GGAGAA ATAG 11881 11903CCTACTGGGAGAACTCT 2074 GCACAGAGAGTTCTCC 3843 CTGTGC CAGT 11910 11932CCACGTTCTCCTGATCA 2075 GATATTTGATCAGGAG 3844 AATATC AACG 11919 11941CCTGATCAAATATCACT 2076 TAGGAGAGTGATATTT 3845 CTCCTA GATC 11938 11960CCTACTTACAGGACTCA 2077 GTATGTTGAGTCCTGT 3846 ACATAC AAGT 11970 11992CCCTATACTCCCTCTAC 2078 AAATATGTAGAGGGA 3847 ATATTT GTATA 11971 11993CCTATACTCCCTCTACA 2079 TAAATATGTAGAGGGA 3848 TATTTA GTAT 11979 12001CCCTCTACATATTTACC 2080 TGTTGTGGTAAATATG 3849 ACAACA TAGA 11980 12002CCTCTACATATTTACCA 2081 GTGTTGTGGTAAATAT 3850 CAACAC GTAG 11994 12016CCACAACACAATGGGG 2082 GAGTGAGCCCCATTGT 3851 CTCACTC GTTG 12018 12040CCCACCACATTAACAAC 2083 TTTTATGTTGTTAATGT 3852 ATAAAA GGT 12019 12041CCACCACATTAACAACA 2084 GTTTTATGTTGTTAAT 3853 TAAAAC GTGG 12022 12044CCACATTAACAACATAA 2085 AGGGTTTTATGTTGTT 3854 AACCCT AATG 12041 12063CCCTCATTCACACGAGA 2086 GTGTTTTCTCGTGTGA 3855 AAACAC ATGA 12042 12064CCTCATTCACACGAGAA 2087 GGTGTTTTTTCGTGTG 3856 AACACC AATG 12063 12085CCCTCATGTTCATACAC 2088 GGATAGGTGTATGAAC 3857 CTATCC ATGA 12064 12086CCTCATGTTCATACACC 2089 GGGATAGGTGTATGAA 3858 TATCCC CATG 12079 12101CCTATCCCCCATTCTCCT 2090 ATAGGAGGAGAATGG 3859 CCTAT GGGAT 12084 12106CCCCCATTCTCCTCCTAT 2091 GAGGGATAGGAGGAG 3860 CCCTC AATGG 12085 12107CCCCATTCTCCTCCTATC 2092 TGAGGGATAGGAGGA 3861 CCTCA GAATG 12086 12108CCCATTCTCCTCCTATCC 2093 TTGAGGGATAGGAGG 3862 CTCAA AGAAT 12087 12109CCATTCTCCTCCTATCCC 2094 GTTGAGGGATAGGAG 3863 TCAAC GAGAA 12094 12116CCTCCTATCCCTCAACC 2095 TGTCGGGGTTGAGGGA 3864 CCGACA TAGG 12097 12119CCTATCCCTCAACCCCG 2096 TGATGTCGGGGTTGAG 3865 ACATCA GGAT 12102 12124CCCTCAACCCCGACATC 2097 GGTAATGATGTCGGGG 3866 ATTACC TTGA 12103 12125CCTCAACCCCGACATCA 2098 CGGTAATGATGTCGGG 3867 TTACCG GTTG 12109 12131CCCCGACATCATTACCG 2099 AAAACCCGGTAATGAT 3868 GGTTTT GTCG 12110 12132CCCGACATCATTACCGG 2100 GAAAACCCGGTAATG 3869 GTTTTC ATGTC 12111 12133CCGACATCATTACCGGG 2101 GGAAAACCCGGTAAT 3870 TTTTCC GATGT 12123 12145CCGGGTTTTCCTCTTGT 2102 ATATTTACAAGAGGAA 3871 AAATAT AACC 12132 12154CCTCTTGTAAATATAGT 2103 GGTTAAACTATATTTA 3872 TTAACC CAAG 12153 12175CCAAAACATCAGATTGT 2104 AGATTCACAATCTGAT 3873 GAATCT GTTT 12194 12216CCCCTTATTTACCGAGA 2105 GAGCTTTCTCGGTAAA 3874 AAGCTC TAAG 12195 12217CCCTTATTTACCGAGAA 2106 TGAGCTTTCTCGGTAA 3875 AGCTCA ATAA 12196 12218CCTTATTTACCGAGAAA 2107 GTGAGCTTTCTCGGTA 3876 GCTCAC AATA 12205 12227CCGAGAAAGCTCACAA 2108 GCAGTTCTTGTGAGCT 3877 GAACTGC TTCT 12237 12259CCCCCATGTCTAACAAC 2109 AGCCATGTTGTTAGAC 3878 ATGGCT ATGG 12238 12260CCCCATGTCTAACAACA 2110 AAGCCATGTTGTTAGA 3879 TGGCTT CATG 12239 12261CCCATGTCTAACAACAT 2111 AAAGCCATGTTGTTAG 3880 GGCTTT ACAT 12240 12262CCATGTCTAACAACATG 2112 GAAAGCCATGTTGTTA 3881 GCTTTC GACA 12288 12310CCATTGGTCTTAGGCCC 2113 TTTTTGGGGCCTAAGA 3882 CAAAAA CCAA 12302 12324CCCCAAAAATTTTGGTG 2114 GAGTTGCACCAAAATT 3883 CAACTC TTTG 12303 12325CCCAAAAATTTTGGTGC 2115 GGAGTTGCACCAAAAT 3884 AACTCC TTTT 12304 12326CCAAAAATTTTGGTGCA 2116 TGGAGTTGCACCAAAA 3885 ACTCCA TTTT 12324 12346CCAAATAAAAGTAATA 2117 GCATGGTTATTACTTT 3886 ACCATGC TATT 12341 12363CCATGCACACTACTATA 2118 GGTGGTTATAGTAGTG 3887 ACCACC TGCA 12359 12381CCACCCTAACCCTGACT 2119 TAGGGAAGTCAGGGTT 3888 TCCCTA AGGG 12362 12384CCCTAACCCTGACTTCC 2120 AATTAGGGAAGTCAG 3889 CTAATT GGTTA 12363 12385CCTAACCCTGACTTCCC 2121 GAATTAGGGAAGTCA 3890 TAATTC GGGTT 12368 12390CCCTGACTTCCCTAATT 2122 GGGGGGAATTAGGGA 3891 CCCCCC AGTCA 12369 12391CCTGACTTCCCTAATTC 2123 TGGGGGGAATTAGGG 3892 CCCCCA AAGTC 12377 12399CCCTAATTCCCCCCATC 2124 CTGTAAGGATGGGGGG 3893 CTTACC AATTA 12378 12400CCTAATTCCCCCCATCC 2125 TGGTAAGGATGGGGG 3894 TTACCA GAATT 12385 12407CCCCCCATCCTTACCAC 2126 ACGAGGGTGGTAAGG 3895 CCTCGT ATGGG 12386 12408CCCCCATCCTTACCACC 2127 AACGAGGGTGGTAAG 3896 CTCGTT GATGG 12387 12409CCCCATCCTTACCACCC 2128 TAACGAGGGTGGTAA 3897 TCGTTA GGATG 12388 12410CCCATCCTTACCACCCT 2129 TTAACGAGGGTGGTAA 3898 CGTTAA GGAT 12389 12411CCATCCTTACCACCCTC 2130 GTTAACGAGGGTGGTA 3899 GTTAAC AGGA 12393 12415CCTTACCACCCTCGTTA 2131 TAGGGTTAACGAGGGT 3900 ACCCTA GGTA 12398 12420CCACCCTCGTTAACCCT 2132 TTTGTTAGGGTTAACG 3901 AACAAA AGGG 12401 12423CCCTCGTTAACCCTAAC 2133 TTTTTTGTTAGGGTTA 3902 AAAAAA ACGA 12402 12424CCTCGTTAACCCTAACA 2134 TTTTTTTGTTAGGGTTA 3903 AAAAAA ACG 12411 12433CCCTAACAAAAAAAACT 2135 GGTATGAGTTTTTTTT 3904 CATACC GTTA 12412 12434CCTAACAAAAAAAACTC 2136 GGGTATGAGTTTTTTT 3905 ATACCC TGTT 12432 12454CCCCCATTATGTAAAAT 2137 CAATGGATTTTACATA 3906 CCATTG ATGG 12433 12455CCCCATTATGTAAAATC 2138 ACAATGGATTTTACAT 3907 CATTGT AATG 12434 12456CCCATTATGTAAAATCC 2139 GACATGGATTTTACA 3908 ATTGTC TAAT 12435 12457CCATTATGTAAAATCCA 2140 CGACAATGGATTTTAC 3909 TTGTCG ATAA 12449 12471CCATTGTCGCATCCACC 2141 AATAAAGGTGGATGC 3910 TTTATT GACAA 12461 12483CCACCTTTATTATCAGT 2142 GAAGAGACTGATAAT 3911 CTCTTC AAAGG 12464 12486CCTTTATTATCAGTCTCT 2143 GGGGAAGAGACTGAT 3912 TCCCC AATAA 12483 12505CCCCACAACAATATTCA 2144 GGCACATGAATATTGT 3913 TGTGCC TGTG 12484 12506CCCACAACAATATTCAT 2145 AGGCACATGAATATTG 3914 GTGCCT TTGT 12485 12507CCACAACAATATTCATG 2146 TAGGCACATGAATATT 3915 TGCCTA GTTG 12504 12526CCTAGACCAAGAAGTTA 2147 AGATAATAACTTCTTG 3916 TTATCT GTCT 12510 12532CCAAGAAGTTATTATCT 2148 AGTTCGAGATAATAAC 3917 CGAACT TTCT 12542 12564CCACAACCCAAACAACC 2149 GAGCTGGGTTGTTTGG 3918 CAGCTC GTTG 12548 12570CCCAAACAACCCAGCTC 2150 TAGGGAGAGCTGGGTT 3919 TCCCTA GTTT 12549 12571CCAAACAACCCAGCTCT 2151 TTAGGGAGAGCTGGGT 3920 CCCTAA TGTT 12557 12579CCCAGCTCTCCCTAAGC 2152 TTTGAAGCTTAGGGAG 3921 TTCAAA AGCT 12558 12580CCAGCTCTCCCTAAGCT 2153 GTTTGAAGCTTAGGGA 3922 TCAAAC GAGC 12566 12588CCCTAACTTCAAACTA 2154 GTAGTCTAGTTTGAAG 3923 GACTAC CTTA 12567 12589CCTAAGCTTCAAACTAG 2155 AGTAGTCTAGTTTGAA 3924 ACTACT GCTT 12593 12615CCATAATATTCATCCCT 2156 TGCTACAGGGATGAAT 3925 GTAGCA ATTA 12606 12628CCCTGTAGCATTGTTCG 2157 ATGTAACGAACAATGC 3926 TTACAT ACA 12607 12629CCTGTAGCATTGTTCGT 2158 CATGTAACGAACAATG 3927 TACATG CTAC 12632 12654CCATCATAGAATTCTCA 2159 TCACAGTGAGAATTCT 3928 CTGTGA ATGA 12669 12691CCCAAACATTAATCAGT 2160 TGAAGAACTGATTAAT 3929 TCTTCA GTTT 12670 12692CCAAACATTAATCAGTT 2161 TTGAAGAACTGATTAA 3930 CTTCAA TGTT 12708 12730CCTAATTACCATACTAA 2162 CTAAGATTAGTATGGT 3931 TCTTAG AATT 12716 12738CCATACTAATCTTAGTT 2163 AGCGGTAACTAAGATT 3932 ACCGCT AGTA 12734 12756CCGCTAACAACCTATTC 2164 CAGTTGGAATAGGTTG 3933 CAACTG TTAG 12744 12766CCTATTCCAACTGTTCA 2165 AGCCGATGAACAGTTG 3934 TCGGCT GAAT 12750 12772CCAACTGTTCATCGGCT 2166 CCTTTCAGCCGATGAA 3935 GAGAGG CAGT 12788 12810CCTTCTTGCTCATCAGTT 2167 TCATCAACTGATGAGC 3936 GATGA AAGA 12815 12837CCCGAGCAGATGCCAAC 2168 TGCTGTGTTGGCATCT 3937 ACAGCA GCTC 12816 12838CCGAGCAGATGCCAAC 2169 CTGCTGTGTTGGCATC 3938 ACAGCAG TGCT 12827 12849CCAACACAGCAGCCATT 2170 TGCTTGAATGGCTGCT 3939 CAAGCA GTGT 12839 12861CCATTCAAGCAATCCTA 2171 GTTGTATAGGATTGCT 3940 TACAAC TGAA 12852 12874CCTATACAACCGTATCG 2172 TATCGCCGATACGGTT 3941 GCGATA GTAT 12861 12883CCGTATCGGCGATATCG 2173 TGAAAACCGATATCGCC 3942 GTTTCA GATA 12885 12907CCTCGCCTTAGCATGAT 2174 GGATAAATCATGCTAA 3943 TTATCC GGCG 12890 12912CCTTAGCATGATTTATC 2175 GTGTAGGATAAATCAT 3944 CTACAC GCTA 12906 12928CCTACACTCCAACTCAT 2176 GGTCTCATGAGTTGGA 3945 GAGACC GTGT 12914 12936CCAACTCATGAGACCCA 2177 TTGTTGTGGGTCTCAT 3946 CAACAA GAGT 12927 12949CCCACAACAAATAGCCC 2178 TTAGAAGGGCTATTTG 3947 TTCTAA TTGT 12928 12950CCACAACAAATAGCCCT 2179 TTTAGAAGGGCTATTT 3948 TCTAAA GTTG 12941 12963CCCTTCTAAACGCTAAT 2180 GCTTGGATTAGCGTTT 3949 CCAAGC AGAA 12942 12964CCTTCTAAACGCTAATC 2181 GGCTTGGATTAGCGTT 3950 CAAGCC TAGA 12958 12980CCAAGCCTCACCCCACT 2182 CCTAGTAGTGGGGTGA 3951 ACTAGG GGCT 12963 12985CCTCACCCCACTACTAG 2183 GGAGGCCTAGTAGTGG 3952 GCCTCC GGTG 12968 12990CCCCACTACTAGGCCTC 2184 TAGGAGGAGGCCTAGT 3953 CTCCTA AGTG 12969 12991CCCACTACTAGGCCTCC 2185 CTAGGAGGAGGCCTA 3954 TCCTAG GTAGT 12970 12992CCACTACTAGGCCTCCT 2186 GCTAGGAGGAGGCCT 3955 CCTAGC AGTAG 12981 13003CCTCCTCCTAGCAGCAG 2187 TGCCTGCTGCTGCTAG 3956 CAGGCA GAGG 12984 13006CCTCCTAGCAGCAGCAG 2188 ATTTGCCTGCTGCTGC 3957 GCAAAT TAGG 12987 13009CCTAGCAGCAGCAGGC 2189 CTGATTTGCCTGCTGC 3958 AAATCAG TGCT 13010 13032CCCAATTAGGTCTCCAC 2190 TCAGGGGTGGAGACCT 3959 CCCTGA AATT 13011 13033CCAATTAGGTCTCCACC 2191 GTCAGGGGTGGAGAC 3960 CCTGAC CTAAT 13023 13045CCACCCCTGACTCCCCT 2192 TGGCTGAGGGGAGTCA 3961 CAGCCA GGGG 13026 13048CCCCTGACTCCCCTCAG 2193 CTATGGCTGAGGGGAG 3962 CCATAG TCAG 13027 13049CCCTGACTCCCCTCAGC 2194 TCTATGGCTGAGGGGA 3963 CATAGA GTCA 13028 13050CCTGACTCCCCTCAGCC 2195 TTCTATGGCTGAGGGG 3964 ATAGAA AGTC 13035 13057CCCCTCAGCCATAGAAG 2196 TGGGGCCTTCTATGGC 3965 GCCCCA TGAG 13036 13058CCCTCAGCCATAGAAGG 2197 GTGGGGCCTTCTATGG 3966 CCCCAC CTGA 13037 13059CCTCAGCCATAGAAGGC 2198 GGTGGGGCCTTCTATG 3967 CCCACC GCTG 13043 13065CCATAGAAGGCCCCACC 2199 GACTGGGGTGGGGCCT 3968 CCAGTC TCTA 13053 13075CCCCACCCCAGTCTCAG 2200 GTAGGGCTGAGACTGG 3969 CCCTAC GGTG 13054 13076CCCACCCCAGTCTCAGC 2201 AGTAGGGCTGAGACTG 3970 CCTACT GGGT 13055 13077CCACCCCAGTCTCAGCC 2202 GAGTAGGGCTGAGACT 3971 CTACTC GGGG 13058 13080CCCCAGTCTCAGCCCTA 2203 GTGGAGTAGGGCTGA 3972 CTCCAC GACTG 13059 13081CCCAGTCTCAGCCCTAC 2204 AGTGGAGTAGGGCTG 3973 TCCACT AGACT 13060 13082CCAGTCTCAGCCCTACT 2205 GAGTGGAGTAGGGCT 3974 CCACTC GAGAC 13070 13092CCCTACTCCACTCAAGC 2206 TATAGTGCTTGAGTGG 3975 ACTATA AGTA 13071 13093CCTACTCCACTCAAGCA 2207 CTATAGTGCTTGAGTG 3976 CTATAG GAGT 13077 13099CCACTCAAGCACTATAG 2208 CTACAACTATAGTGCT 3977 TTGTAG TGAG 13119 13141CCGCTTCCACCCCCTAG 2209 TTTCTGCTAGGGGGTG 3978 CAGAAA GAAG 13125 13147CCACCCCCTAGCAGAAA 2210 GGCTATTTTCTGCTAG 3979 ATAGCC GGGG 13128 13150CCCCCTAGCAGAAAATA 2211 GTGGGCTATTTTCTGC 3980 GCCCAC TAGG 13129 13151CCCCTAGCAGAAAATAG 2212 AGTGGGCTATTTTCTG 3981 CCCACT CTAG 13130 13152CCCTAGCAGAAAATAGC 2213 TAGTGGGCTATTTTCT 3982 CCACTA GCTA 13131 13153CCTAGCAGAAAATAGCC 2214 TAGTGGCCTATTTTC 3983 CACTAA TGCT 13146 13168CCCACTAATCCAAACTC 2215 GTGTTAGAGTTTGGAT 3984 TAACAC TAGT 13147 13169CCACTAATCCAAACTCT 2216 AGTGTTAGAGTTTGGA 3985 AACACT TTAG 13155 13177CCAAACTCTAACACTAT 2217 CTAAGCATAGTGTTAG 3986 GCTTAG AGTT 13187 13209CCACTCTGTTCGCAGCA 2218 GCAGACTGCTGCGAAC 3987 GTCTGC AGAG 13211 13233CCCTTACACAAAATGAC 2219 TTTGATGTCATTTTGTG 3988 ATCAAA TAA 13212 13234CCTTACACAAAATGACA 2220 TTTTGATGTCATTTTGT 3989 TCAAAA GTA 13244 13266CCTTCTCCACTTCAAGT 2221 TAGTTGACTTGAAGTG 3990 CAACTA GAGA 13250 13272CCACTTCAAGTCAACTA 2222 GAGTCCTAGTTGACTT 3991 GGACTC GAAG 13296 13318CCAACCACACCTAGCAT 2223 GCAGGAATGCTAGGTG 3992 TCCTGC TGGT 13300 13322CCACACCTAGCATTCCT 2224 ATGTGCAGGAATGCTA 3993 GCACAT GGTG 13305 13327CCTAGCATTCCTGCACA 2225 TACAGATGTGCAGGAA 3994 TCTGTA TGCT 13314 13336CCTGCACATCTGTACCC 2226 AGGCGTGGGTACAGAT 3995 ACGCCT GTGC 13328 13350CCCACGCCTTCTTCAAA 2227 TATGGCTTTGAAGAAG 3996 GCCATA GCGT 13329 13351CCACGCCTTCTTCAAAG 2228 GTATGGCTTTGAAGAA 3997 CCATAC GGCG 13334 13356CCTTCTTCAAAGCCATA 2229 AAATAGTATGGCTTTG 3998 CTATTT AAGA 13346 13368CCATACTATTTATGTGC 2230 CCCGGAGCACATAAAT 3999 TCCGGG AGTA 13364 13386CCGGGTCCATCATCCAC 2231 AAGGTTGTGGATGATG 4000 AACCTT GACC 13370 13392CCATCATCCACAACCTT 2232 ATTGTTAAGGTTGTGG 4001 AACAAT ATGA 13377 13399CCACAACCTTAACAATG 2233 CTTGTTCATTGTTAAG 4002 AACAAG GTTG 13383 13405CCTTAACAATGAACAAG 2234 GAATATCTTGTTCATT 4003 ATATTC GTTA 13430 13452CCATACCTCTCACTTCA 2235 GGAGGTTGAAGTGAG 4004 ACCTCC AGGTA 13435 13457CCTCTCACTTCAACCTC 2236 GTGAGGGAGGTTGAA 4005 CCTCAC GTGAG 13448 13470CCTCCCTCACCATTGGC 2237 TAGGCTGCCAATGGTG 4006 AGCCTA AGGG 13451 13473CCCTCACCATTGGCAGC 2238 TGCTAGGCTGCCAATG 4007 CTAGCA GTGA 13452 13474CCTCACCATTGGCAGCC 2239 ATGCTAGGCTGCCAAT 4008 TAGCAT GGTG 13457 13479CCATTGGCAGCCTAGCA 2240 TGCTAATGCTAGGCTG 4009 TTAGCA CCAA 13467 13489CCTAGCATTAGCAGGAA 2241 AAGGTATTCCTGCTAA 4010 TACCTT TGCT 13486 13508CCTTTCCTCACAGGTTT 2242 GAGTAGAAACCTGTGA 4011 CTACTC GGAA 13491 13513CCTCACAGGTTTCTACT 2243 CTTTGGAGTAGAAACC 4012 CCAAAG TGTG 13508 13530CCAAAGACCACATCATC 2244 GGTTTCGATGATGTGG 4013 GAAACC TCTT 13515 13537CCACATCATCGAAACCG 2245 TGTTTGCGGTTTCGAT 4014 CAAACA GATG 13529 13551CCGCAAACATATCATAC 2246 GTTTGTGTATGATATG 4015 ACAAAC TTTG 13553 13575CCTGAGCCCTATCTATT 2247 GAGAGTAATAGATAG 4016 ACTCTC GGCTC 13559 13581CCCTATCTATTACTCTC 2248 AGCGATGAGAGTAAT 4017 ATCGCT AGATA 13560 13582CCTATCTATTACTCTCAT 2249 TAGCGATGAGAGTAAT 4018 CGCTA AGAT 13583 13605CCTCCCTGACAAGCGCC 2250 GCTATAGGCGCTTGTC 4019 TATAGC AGGG 13586 13608CCCTGACAAGCGCCTAT 2251 AGTGCTATAGGCGCTT 4020 AGCACT GTCA 13587 13609CCTGACAAGCGCCTATA 2252 GAGTGCTATAGGCGCT 4021 GCACTC TGTC 13598 13620CCTATAGCACTCGAATA 2253 AAGAATTATTCGAGTG 4022 ATTCTT CTAT 13625 13647CCCTAACAGGTCAACCT 2254 GAAGCGAGGTTGACCT 4023 CGCTTC GTTA 13626 13648CCTAACAGGTCAACCTC 2255 GGAAGCGAGGTTGAC 4024 GCTTCC CTGTT 13639 13661CCTCGCTTCCCCACC 2256 TAGTAAGGGTGGGGA 4025 TACTAA AGCG 13647 13669CCCCACCCTTACTAACA 2257 CGTTAATGTTAGTAAG 4026 TTAACG GGTG 13648 13670CCCACCCTTACTAACAT 2258 TCGTTAATGTTAGTAA 4027 TAACGA GGGT 13649 13671CCACCCTTACTAACATT 2259 TTCGTTAATGTTAGTA 4028 AACGAA AGGG 13652 13674CCCTTACTAACATTAAC 2260 ATTTTCGTTAATGTTA 4029 GAAAAT GTAA 13653 13675CCTTACTAACATTAACG 2261 TATTTTCGTTAATGTTA 4030 AAAATA GTA 13677 13699CCCCACCCTACTATACC 2262 TAATGGGGTTTAGTAG 4031 CCATTA GGTG 13678 13700CCCACCCTACTAAACCC 2263 TTAATGGGGTTTAGTA 4032 CATTAA GGGT 13679 13701CCACCCTACTAAACCCC 2264 TTTAATGGGGTTTAGT 4033 ATTAAA AGGG 13682 13704CCCTACTAAACCCCATT 2265 GCGTTTAATGGGGTTT 4034 AAACGC AGTA 13683 13705CCTACTAAACCCCATTA 2266 GGCGTTTAATGGGGTT 4035 AACGCC TAGT 13692 13714CCCCATTAAACGCCTGG 2267 CGGCTGCCAGGCGTTT 4036 CAGCCG AATG 13693 13715CCCATTAAACGCCTGGC 2268 CCGGCTGCCAGGCGTT 4037 AGCCGG TAAT 13694 13716CCATTAAACGCCTGGCA 2269 TCCGGCTGCCAGGCGT 4038 GCCGGA TTAA 13704 13726CCTGGCAGCCGGAAGCC 2270 CGAATAGGCTTCCGGC 4039 TATTCG TGCC 13712 13734CCGGAAGCCTATTCGCA 2271 AAATCCTGCGAATAGG 4040 GGATTT CTTC 13719 13741CCTATTCGCAGGATTTC 2272 TAATGAGAAATCCTGC 4041 TCATTA GAAT 13754 13776CCCCCGCATCCCCCTTC 2773 TGTTTGGAAGGGGGAT 4042 CAAACA GCGG 13755 13777CCCCGCATCCCCCTTCC 2274 TTGTTTGGAAGGGGGA 4043 AAACAA TGCG 13756 13778CCCGCATCCCCCTTCCA 7275 GTTGTTTGGAAGGGGG 4044 AACAAC ATGC 13757 13779CCGCATCCCCCTTCCAA 2276 TGTTGTTTGGAAGGGG 4045 ACAACA GATG 13763 13785CCCCCTTCCAAACAACA 2277 GGGGATTGTTGTTTGG 4046 ATCCCC AAGG 13764 13786CCCCTTCCAAACAACAA 2278 GGGGGATTGTTGTTTG 4047 TCCCCC GAAG 13765 13787CCCTTCCAAACAACAAT 2279 AGGGGGATTGTTGTTT 4048 CCCCCT GGAA 13766 13788CCTTCCAAACAACAATC 2280 GAGGGGGATTGTTGTT 4049 CCCCTC TGGA 13770 13792CCAAACAACAATCCCCC 2281 GGTAGAGGGGGATTGT 4050 TCTACC TGTT 13782 13804CCCCCTCTACCTAAAAC 2282 CTGTGAGTTTTAGGTA 4051 TCACAG GAGG 13783 13805CCCCTCTACCTAAAACT 2283 GCTGTGAGTTTTAGGT 4052 CACAGC AGAG 13784 13806CCCTCTACCTAAAACTC 2284 GGCTGTGAGTTTTAGG 4053 ACAGCC TAGA 13785 13807CCTCTACCTAAAACTCA 2285 GGGCTGTGAGTTTTAG 4054 CAGCCC GTAG 13791 13813CCTAAAACTCACAGCCC 2286 CAGCGAGGGCTGTGA 4055 TCGCTG GTTTT 13805 13827CCCTCGCTGTCACTTTC 2287 TCCTAGGAAAGTGACA 4056 CTAGGA GCGA 13806 13828CCTCGCTGTCACTTTCCT 2288 GTCCTAGGAAAGTGAC 4057 AGGAC AGCG 13821 13843CCTAGGACTTCTAACAG 2289 CTAGGGCTGTTAGAAG 4058 CCCTAG TCCT 13838 13860CCCTAGACCTCAACTAC 2290 GGTTAGGTAGTTGAGG 4059 CTAACC TCTA 13839 13861CCTAGACCTCAACTACC 2291 TGGTTAGGTAGTTGAG 4060 TAACCA GTCT 13845 13867CCTCAACTACCTAACCA 2292 GTTTGTTGGTTAGGTA 4061 ACAAAC GTTG 13854 13876CCTAACCAACAAACTTA 2293 TTATTTTAAGTTTGTTG 4062 AAATAA GTT 13859 13881CCAACAAACTTAAAATA 2294 GGATTTTATTTTAAGT 4063 AAATCC TTGT 13880 13902CCCCACTATGCACATTT 2295 GAAATAAAATGTGCAT 4064 TATTTC AGTG 13881 13903CCCACTATGCACATTTT 2296 AGAAATAAAATGTGC 4065 ATTTCT ATAGT 13882 13904CCACTATGCACATTTTA 2297 GAGAAATAAAATGTG 4066 TTTCTC CATAG 13904 13926CCAACATACTCGGATTC 2298 AGGGTAGAATCCGAGT 4067 TACCCT ATGT 13923 13945CCCTAGCATCACACACC 2299 TTGTGCGGTGTGTGAT 4068 GCACAA GCTA 13924 13946CCTAGCATCACACACCG 2300 ATTGTGCGGTGTGTGA 4069 CACAAT TGCT 13938 13960CCGCACAATCCCCTATC 2301 GGCCTAGATAGGGGAT 4070 TAGGCC TGTG 13947 13969CCCCTATCTAGGCCTTC 2302 TCGTAAGAAGGCCTAG 4071 TTACGA ATAG 13948 13970CCCTATCTAGGCCTTCT 2303 CTCGTAAGAAGGCCTA 4072 TACGAG GATA 13949 13971CCTATCTAGGCCTTCTT 2304 GCTCGTAAGAAGGCCT 4073 ACGAGC AGAT 13959 13981CCTTCTTACGAGCCAAA 2305 GCAGGTTTTGGCTCGT 4074 ACCTGC AAGA 13971 13993CCAAAACCTGCCCCTAC 2306 GGAGGAGTAGGGGCA 4075 TCCTCC GGTTT 13977 13999CCTGCCCCTACTCCTCC 2307 GGTCTAGGAGGAGTA 4076 TAGACC GGGGC 13981 14003CCCCTACTCCTCCTAGA 2308 GTTAGGTCTAGGAGGA 4077 CCTAAC GTAG 13982 14004CCCTACTCCTCCTAGAC 2309 GGTTAGGTCTAGGAGG 4078 CTAACC AGTA 13983 14005CCTACTCCTCCTAGACC 2310 AGGTTAGGTCTAGGAG 4079 TAACCT GAGT 13989 14011CCTCCTAGACCTAACCT 2311 CTAGTCAGGTTAGGTC 4080 GACTAG TAGG 13992 14014CCTAGACCTAACCTGAC 2312 TTTCTAGTCAGGTTAG 4081 TAGAAA GTCT 13998 14020CCTAACCTGACTAGAAA 2313 ATAGCTTTTCTAGTCA 4082 AGCTAT GGTT 14003 14025CCTGACTAGAAAAGCTA 2314 AGGTAATAGCTTTTCT 4083 TTACCT AGTC 14023 14045CCTAAAACAATTTCACA 2315 TGGTGCTGTGAAATTG 4084 GCACCA TTTT 14043 14065CCAAATCTCCACCTCCA 2316 TGATGATGGAGGTGGA 4085 TCATCA GATT 14051 14073CCACCTCCATCATCACC 2317 GGTTGAGGTGATGATG 4086 TCAACC GAGG 14054 14076CCTCCATCATCACCTCA 2318 TTGGGTTGAGGTGATG 4087 ACCCAA ATGG 14057 14079CCATCATCACCTCAACC 2319 TTTTTGGGTTGAGGTG 4088 CAAAAA ATGA 14066 14088CCTCAACCCAAAAAGGC 2320 AATTATGCCTTTTTGG 4089 ATAATT GTTG 14072 14094CCCAAAAAGGCATAATT 2321 AAGTTTAATTATGCCT 4090 AAACTT TTTT 14073 14095CCAAAAAGGCATAATTA 2322 AAAGTTTAATTATGCC 4091 AACTTT TTTT 14100 14122CCTCTCTTTCTTCTTCCC 2323 TGAGTGGGAAGAAGA 4092 ACTCA AAGAG 14115 14137CCCACTCATCCTAACCC 2324 GGAGTAGGGTTAGGAT 4093 TACTCC GAGT 14116 14138CCACTCATCCTAACCCT 2325 AGGAGTAGGGTTAGG 4094 ACTCCT ATGAG 14124 14146CCTAACCCTACTCCTAA 2326 ATGTGATTAGGAGTAG 4095 TCACAT GGTT 14129 14151CCCTACTCCTAATCACA 2327 AGGTTATGTGATTAGG 4096 TAACCT AGTA 14130 14152CCTACTCCTAATCACAT 2328 TAGGTTATGTGATTAG 4097 AACCTA GAGT 14136 14158CCTAATCACATAACCTA 2329 GGGGAATAGGTTATGT 4098 TTCCCC GATT 14149 14171CCTATTCCCCCGAGCAA 2330 TTGAGATTGCTCGGGG 4099 TCTCAA GAAT 14155 14177CCCCCGAGCAATCTCAA 2331 TTGTAATTGAGATTGC 4100 TTACAA TCGG 14156 14178CCCCGAGCAATCTCAAT 2332 ATTGTAATTGAGATTG 4101 TACAAT CTCG 14157 14179CCCGAGCAATCTCAATT 2333 TATTGTAATTGAGATT 4102 ACAATA GCTC 14158 14180CCGAGCAATCTCAATTA 2334 ATATTGTAATTGAGAT 4103 CAATAT TGCT 14186 14208CCAACAAACAATGTTCA 2335 ACTGGTTGAACATTGT 4104 ACCAGT TTGT 14204 14226CCAGTAACTACTACTAA 2336 CGTTGATTAGTAGTAG 4105 TCAACG TTAC 14227 14249CCCATAATCATACAAAG 2337 CGGGGGCTTTGTATGA 4106 CCCCCG TTAT 14228 14250CCATAATCATACAAAGC 2338 GCGGGGGCTTTGTATG 4107 CCCCGC ATTA 14244 14266CCCCCGCACCAATAGGA 2339 GGAGGATCCTATTGGT 4108 TCCTCC GCGG 14245 14267CCCCGCACCAATAGGAT 2340 GGGAGGATCCTATTGG 4109 CCTCCC TGCG 14246 14268CCCGCACCAATAGGATC 2341 CGGGAGGATCCTATTG 4110 CTCCCG GTGC 14247 14269CCGCACCAATAGGATCC 2342 TCGGGAGGATCCTATT 4111 TCCCGA GGTG 14252 14274CCAATAGGATCCTCCCG 2343 TTGATTCGGGAGGATC 4112 AATCAA CTAT 14262 14284CCTCCCGAATCAACCCT 2344 GGGGTCAGGGTTGATT 4113 GACCCC CGGG 14265 14287CCCGAATCAACCCTGAC 2345 AGAGGGGTCAGGGTT 4114 CCCTCT GATTC 14266 14288CCGAATCAACCCTGACC 2346 GAGAGGGGTCAGGGT 4115 CCTCTC TGATT 14275 14297CCCTGACCCCTCTCCTT 2347 TTTATGAAGGAGAGGG 4116 CATAAA GTCA 14276 14298CCTGACCCCTCTCCTTC 2348 ATTTATGAAGGAGAGG 4117 ATAAAT GGTC 14281 14303CCCCTCTCCTTCATAAA 2349 GAATAATTTATGAAGG 4118 TTATTC AGAG 14282 14304CCCTCTCCTTCATAAAT 2350 TGAATAATTTATGAAG 4119 TATTCA GAGA 14283 14305CCTCTCCTTCATAAATT 2351 CTGAATAATTTATGAA 4120 ATTCAG GGAG 14288 14310GCTTCATAAATTATTCA 2352 GGAAGCTGAATAATTT 4121 GCTTCC ATGA 14309 14331CCTACACTATTAAAGTT 2353 GTGGTAAACTTTAATA 4122 TACCAC GTGT 14328 14350CCACAACCACCACCCCA 2354 GTATGATGGGGTGGTG 4123 TCATAC GTTG 14334 14356CCACCACCCCATCATAC 2355 GAAAGAGTATGATGG 4124 TCTTTC GGTGG 14337 14359CCACCCCATCATACTCT 2356 GGTGAAAGAGTATGAT 4125 TTCACC GGGG 14340 14362CCCCATCATACTCTTTT 2357 CTGGGTGAAAGAGTAT 4126 ACCCAC GATG 14341 14363CCCATCATACTCTTTCA 2358 TGTGGGTGAAAGAGTA 4127 CCCACA TGAT 14342 14364CCATCATACTCTTTCAC 2359 CTGTGGGTGAAAGAGT 4128 CCACAG ATGA 14358 14380CCCACAGCACCAATCCT 2360 GGAGGTAGGATTGGTG 4129 ACCTCC CTGT 14359 14381CCACAGCACCAATCCTA 2361 TGGAGGTAGGATTGGT 4130 CCTCCA GCTG 14367 14389CCAATCCTACCTCCATC 2362 GTTAGCGATGGAGGTA 4131 GCTAAC GGAT 14372 14394CCTACCTCCATCGCTAA 2363 GTGGGGTTAGCGATGG 4132 CCCCAC AGGT 14376 14398CCTCCATCGCTAACCCC 2364 TTTAGTGGGGTTAGCG 4133 ACTAAA ATGG 14379 14401CCATCGCTAACCCCACT 2365 TGTTTTAGTGGGGTTA 4134 AAAACA GCGA 14389 14411CCCCACTAAAACACTCA 2366 TCTTGGTGAGTGTTTT 4135 CCAAGA AGTG 14390 14412CCCACTAAAACACTCAC 2367 GTCTTGGTGAGTGTTT 4136 CAAGAC TAGT 14391 14413CCACTAAAACACTCACC 2368 GGTCTTGGTGAGTGTT 4137 AAGACC TTAG 14406 14428CCAAGACCTCAACCCCT 2369 GGGGTCAGGGGTTGA 4138 GACCCC GGTCT 14412 14434CCTCAACCCCTGACCCC 2370 GGCATGGGGGTCAGG 4139 CATGCC GGTTG 14418 14440CCCCTGACCCCCATGCC 2371 TCCTGAGGCATGGGGG 4140 TCAGGA TCAG 14419 14441CCCTGACCCCCATGCCT 2372 ATCCTGAGGCATGGGG 4141 CAGGAT GTCA 14420 14442CCTGACCCCCATGCCTC 2373 TATCCTGAGGCATGGG 4142 AGGATA GGTC 14425 14447CCCCCATGCCTCAGGAT 2374 AGGAGTATCCTGAGGC 4143 ACTCCT ATGG 14426 14448CCCCATGCCTCAGGATA 2375 GAGGAGTATCCTGAGG 4144 CTCCTC CATG 14427 14449CCCATGCCTCAGGATAC 2376 TGAGGAGTATCCTGAG 4145 TCCTCA GCAT 14428 14450CCATGCCTCAGGATACT 2377 TTGAGGAGTATCCTGA 4146 CCTCAA GGCA 14433 14455CCTCAGGATACTCCTCA 2378 GGCTATTGAGGAGTAT 4147 ATAGCC CCTG 14445 14467CCTCAATAGCCATCGCT 2379 TACTACAGCGATGGCT 4148 GTAGTA ATTG 14454 14476CCATCGCTGTAGTATAT 2380 CTTTGGATATACTACA 4149 CCAAAG GCGA 14471 14493CCAAAGACAACCATCAT 2381 GGGGGAATGATGGTTG 4150 TCCCCC TCTT 14481 14503CCATCATTCCCCCTAAA 2382 AATTTATTTAGGGGGA 4151 TAAATT ATGA 14489 14511CCCCCTAAATAAATTAA 2383 GTTTTTTTAATTTATTT 4152 AAAAAC AGG 14490 14512CCCCTAAATAAATTAA 2384 AGTTTTTTTAATTTATT 4153 AAAACT TAG 14491 14513CCCTAAATAAATTAAAA 2385 TAGTTTTTTTAATTTAT 4154 AAACTA TTA 14492 14514CCTAAATAAATTAAAAA 2386 ATAGTTTTTTTAATTTA 4155 AACTAT TTT 14519 14541CCCATATAACCTCCCCC 2387 AATTTTGGGGGAGGTT 4156 AAAATT ATAT 14520 14542CCATATAACCTCCCCCA 2388 GAATTTTGGGGGAGGT 4157 AAATTC TATA 14528 14550CCTCCCCCAAAATTCAG 2389 ATTATTCTGAATTTTG 4158 AATAAT GGGG 14531 14553CCCCCAAAATTCAGAAT 2390 GTTATTATTCTGAATTT 4159 AATAAC TGG 14532 14554CCCCAAAATTCAGAATA 2391 TGTTATTATTCTGAATT 4160 ATAACA TTG 14533 14555CCCAAAATTCAGAATAA 2392 GTGTTATTATTCTGAA 4161 TAACAC TTTT 14534 14556CCAAAATTCAGAATAAT 2393 TGTGTTATTATTCTGA 4162 AACACA ATTT 14557 14579CCCGACCACACCGCTAA 2394 TGATTGTTAGCGGTGT 4163 CAATCA GGTC 14558 14580CCGACCACACCGCTAAC 2395 TTGATTGTTAGCGGTG 4164 AATCAA TGGT 14562 1484CCACACCGCTAACAATC 2396 AGTATTGATTGTTAGC 4165 AATACT GGTG 14567 14589CCGCTAACAATCAATAC 2397 GGTTTAGTATTGATTG 4166 TAAACC TTAG 14588 14610CCCCCATAAATAGGAGA 2398 AAGCCTTCTCCTATTT 4167 AGGCTT ATGG 14589 14611CCCCATAAATAGGAGA 2399 TAAGCCTTCTCCTATTT 4168 AGGCTTA ATG 14590 14612CCCATAAATAGGAGAA 2400 CTAAGCCTTCTCCTAT 4169 GGCTTAG TTAT 14591 14613CCATAAATAGGAGAAG 2401 TCTAAGCCTTCTCCTA 4170 GCTTAGA TTTA 14620 14642CCCCACAAACCCCATTA 2402 GTTTAGTAATGGGGTT 4171 CTAAAC TGTG 14621 14643CCCACAAACCCCATTAC 2403 GGTTTAGTAATGGGGT 4172 TAAACC TTGT 14622 14644CCACAAACCCCATTACT 2404 GGGTTTAGTAATGGGG 4173 AAACCC TTTG 14629 14651CCCCATTACTAAACCCA 2405 TGAGTGTGGGTTTAGT 4174 CACTCA AATG 14630 14652CCCATTACTAAACCCAC 2406 TTGAGTGTGGGTTTAG 4175 ACTCAA TAAT 14631 14653CCATTACTAAACCCACA 2407 GTTGAGTGTGGGTTTA 4176 CTCAAC GTAA 14642 14664CCCACACTCAACAGAAA 2408 GCTTTGTTTCTGTTGA 4177 CAAAGC GTGT 14643 14665CCACACTCAACAGAAAC 2409 TGCTTTGTTTCTGTTGA 4178 AAAGCA GTG 14694 14716CCACGACCAATGATATG 2410 GTTTTTCATATCATTG 4179 AAAAAC GTCG 14700 14722CCAATGATATGAAAAAC 2411 ACGATGGTTTTTCATA 4180 CATCGT TCAT 14716 14738CCATCTTGTATTTCAA 2412 TTGTAGTTGAAATACA 4181 CTACAA ACGA 14744 14766CCAATGACCCCAATACG 2413 GTTTTGCGTATTGGGG 4182 CAAAAC TCAT 14751 14773CCCCAATACGCAAAACT 2414 GGGGTTAGTTTTGCGT 4183 AACCCC ATTG 14752 14774CCCAATACGCAAAACTA 2415 GGGGGTTAGTTTTGCG 4184 ACCCCC TATT 14753 14775CCAATACGCAAAACTAA 2416 AGGGGGTTAGTTTTGC 4185 CCCCCT GTAT 14770 14792CCCCCTAATAAAATTAA 2417 GGTTAATTAATTTTAT 4186 TTAACC TAGG 14771 14793CCCCTAATAAAATTAAT 2418 TGGTTAATTAATTTTA 4187 TAACCA TTAG 14772 14794CCCTAATAAAATTAATT 2419 GTGGTTAATTAATTTT 4188 AACCAC ATTA 14773 14795CCTAATAAAATFAATTA 2420 AGTGGTTAATTAATTT 4189 ACCACT TATT 14791 14813CCACTCATTCATCGACC 2421 TGGGGAGGTCGATGA 4190 TCCCCA ATGAG 14806 14828CCTCCCCACCCCATCCA 2422 AGATGTTGGATGGGGT 4191 ACATCT GGGG 14809 14831CCCCACCCCATCCAACA 2423 CGGAGATGTTGGATGG 4192 TCTCCG GGTG 14810 14832CCCACCCCATCCAACAT 2424 GCGGAGATGTTGGATG 4193 CTCCGC GGGT 14811 14833CCACCCCATCCAACATC 2425 TGCGGAGATGTTGGAT 4194 TCCGCA GGGG 14814 14836CCCCATCCAACATCTCC 2426 TCATGCGGAGATGTTG 4195 GCATGA GATG 14815 14837CCCATCCAACATCTCCG 2427 ATCATGCGGAGATGTT 4196 CATGAT GGAT 14816 14838CCATCCAACATCTCCGC 2428 CATCATGCGGAGATGT 4197 ATGATG TGGA 14820 14842CCAACATCTCCGCATGA 2429 GTTTCATCATGCGGAG 4198 TGAAAC ATGT 14829 14851CCGCATGATGAAACTTC 2430 TGAGCCGAAGTTTCAT 4199 GGCTCA CATG 14854 14876CCTTGGCGCCTGCCTGA 2431 GGAGGATCAGGCAGG 4200 TCCTCC CGCCA 14862 14884CCTGCCTGATCCTCCAA 2432 GGTGATTTGGAGGATC 4201 ATCACC AGGC 14866 14888CCTGATCCTCCAAATCA 2433 CTGTGGTGATTTGGAG 4202 CCACAG GATC 14872 14894CCTCCAAATCACCACAG 2434 ATAGTCCTGTGGTGAT 4203 GACTAT TTGG 14875 14897CCAAATCACCACAGGAC 2435 GGAATAGTCCTGTGGT 4204 TATTCC GATT 14883 14905CCACAGGACTATTCCTA 2436 CATGGCTAGGAATAGT 4205 GCCATG CCTG 14896 14918CCTAGCCATGCACTACT 2437 CTGGTGAGTAGTGCAT 4206 CACCAG GGCT 14901 14923CCATGCACTACTCACCA 2438 GGCGTCTGGTGAGTAG 4207 GACGCC TGCA 14915 14937CCAGACGCCTCAACCGC 2439 GAAAAGGCGGTTGAG 4208 CTTTTC GCGTC 14922 14944CCTCAACCGCCTTTTCA 2440 GATTGATGAAAAGGC 4209 TCAATC GGTTG 14928 14950CCGCCTTTTCATCAATC 2441 GTGGGCGATTGATGAA 4210 GCCCAC AAGG 14931 14953CCTTTTCATCAATCGCC 2442 GATGTGGGCGATTGAT 4211 CACATC GAAA 14946 14968CCCACATCACTCGAGAC 2443 ATTTACGTCTCGAGTG 4212 GTAAAT ATGT 14947 14969CCACATCACTCGAGACG 2444 AATTTACGTCTCGAGT 4213 TAAATT GATG 14983 15005CCGCTACCTTCACGCCA 2445 CGCCATTGGCGTGAAG 4214 ATGGCG GTAG 14989 15011CCTTCACGCCAATGGCG 2446 TTGAGGCGCCATTGGC 4215 CCTCAA GTGA 14997 15019CCAATGGCGCCTCAATA 2447 AAAGAATATTGAGGC 4216 TTCTTT GCCAT 15006 15028CCTCAATATTCTTTATCT 2448 GAGGCAGATAAAGAA 4217 GCCTC TATTG 15025 15047CCTCTTCCTACACATCG 2449 CTCGCCCGATGTGTAG 4218 GGCGAG GAAG 15031 15053CCTACACATCGGGCGAG 2450 ATAGGCCTCGCCCGAT 4219 GCCTAT GTGT 15049 15071CCTATATTACGGATCAT 2451 AGAGAAATGATCCGTA 4220 TTCTCT ATAT 15081 15103CCTGAAACTTCGGCATT 2452 GAGGATAATGCCGATG 4221 ATCCTC TTTC 15100 15122CCTCCTGCTTGCAACTA 2453 TTGCTATAGTTGCAAG 4222 TAGCAA CAGG 15103 15125CCTGCTTGCAACTATAG 2454 CTGTTGCTATAGTTGC 4223 CAACAG AAGC 15126 15148CCTTCATAGGCTATGTC 2455 CGGGAGGACATAGCCT 4224 CTCCCG ATGA 15142 15164CCTCCCGTGAGGCCAAA 2456 ATGATATTTGGCCTCA 4225 TATCAT CGGG 15145 15167CCCGTGAGGCCAAATAT 2457 AGAATGATATTTGGCC 4226 CATTCT TCAC 15146 15168CCGTGAGGCCAAATATC 2458 CAGAATGATATTTGGC 4227 ATTCTG CTCA 15154 15176CCAAATATCATTCTGAG 2459 TGGCCCCTCAGAATGA 4228 GGGCCA TATT 15174 15196CCACAGTAATTACAAAC 2460 TAGTAAGTTTGTAATT 4229 TTACTA ACTG 15198 15220CCGCCATCCCATACATT 2461 TGTCCCAATGTATGGG 4230 GGGACA ATGG 15201 15223CCATCCCATACATTGGG 2462 GTCTGTCCCAATGTAT 4231 ACAGAC GGGA 15205 15727CCCATACATTGGGACAG 2463 CTAGGTCTGTCCCAAT 4232 ACCTAG GTAT 15206 15228CCATACATTGGGACAGA 2464 ACTAGGTCTGTCCCAA 4233 CCTAGT TGTA 15223 15245CCTAGTTCAATGAATCT 2465 CTCCTCAGATTCATTG 4234 GAGGAG AACT 15263 15285CCCACCCTCACACGATT 2466 GTAAAGAATCGTGTGA 4235 CTTTAC GGGT 15264 15286CCACCCTCACACGATTC 2467 GGTAAAGAATCGTGTG 4236 TTTACC AGGG 15267 15289CCCTCACACGATTCTTT 2468 AAAGGTAAAGAATCG 4237 ACCTTT TGTGA 15268 15290CCTCACACGATTCTTTA 2469 GAAAGGTAAAGAATC 4238 CCTTTC GTGTG 15285 15307CCTTTCACTTCATCTTGC 2470 GAAGGGCAAGATGAA 4239 CCTTC GTGAA 15302 15324CCCTTCATTATTGCAGC 2471 GCTAGGGCTGCAATAA 4240 CCTAGC TGAA 15303 15325CCTTCATTATTGCAGCC 2472 TGCTAGGGCTGCAATA 4241 CTAGCA ATGA 15318 15340CCCTAGCAACACTCCAC 2473 TAGGAGGTGGAGTGTT 4242 CTCCTA GCTA 15319 15341CCTAGCAACACTCCACC 2474 ATAGGAGGTGGAGTGT 4243 TCCTAT TGCT 15331 15353CCACCTCCTATTCTTGC 2475 TTTCGTGCAAGAATAG 4244 ACGAAA GAGG 15334 15356CCTCCTATTCTTGCACG 2476 CCGTTTCGTGCAAGAA 4245 AAACGG TAGG 15337 15359CCTATTCTTGCACGAAA 2477 ATCCCGTTTCGTGCAA 4246 CGGGAT GAAT 15367 15389CCCCCTAGGAATCACCT 2478 AATGGGAGGTGATTCC 4247 CCCATT TAGG 15368 15390CCCCTAGGAATCACCTC 2479 GAATGGGAGGTGATTC 4248 CCATTC CTAG 15369 15391CCCTAGGAATCACCTCC 2480 GGAATGGGAGGTGATT 4249 CATTCC CCTA 15370 15392CCTAGGAATCACCTCCC 2481 CGGAATGGGAGGTGA 4250 ATTCCG TTCCT 15381 15403CCTCCCATTCCGATAAA 2482 GGTGATTTTATCGGAA 4251 ATCACC TGGG 15384 15406CCCATTCCGATAAAATC 2483 GAAGGTGATTTTATCG 4252 ACCTTC GAAT 15385 15407CCATTCCGATAAAATCA 2484 GGAAGGTGATTTTATC 4253 CCTTCC GGAA 15390 15412CCGATAAAATCACCTTC 2485 AGGGTGGAAGGTGATT 4254 CACCCT TTAT 15402 15424CCTTCCACCCTTACTAC 2486 GATTGTGTAGTAAGGG 4255 ACAATC TGGA 15406 15428CCACCCTTACTACACAA 2487 CTTTGATTGTGTAGTA 4256 TCAAAG AGGG 15409 15431CCCTTACTACACAATCA 2488 CGTCTTTGATTGTGTA 4257 AAGACG GTAA 15410 15432CCTTACTACACAATCAA 2489 GCGTCTTTGATTGTGT 4258 AGACGC AGTA 15432 15454CCCTCGGCTTACTTCTCT 2490 AAGGAAGAGAAGTAA 4259 TCCTT GCCGA 15433 15455CCTCGGCTTACTTCTCTT 2491 GAAGGAAGAGAAGTA 4260 CCTTC AGCCG 15451 15473CCTTCTCTCCTTAATGA 2492 TTAATGTCATTAAGGA 4261 CATTAA GAGA 15459 15481CCTTAATGACATTAACA 2493 GAATAGTGTTAATGTC 4262 CTATTC ATTA 15485 15507CCAGACCTCCTAGGCGA 2494 TCTGGGTCGCCTAGGA 4263 CCCAGA GGTC 15490 15512CCTCCTAGGCGACCCAG 2495 AATTGTCTGGGTCGCC 4264 ACAATT TAGG 15493 15515CCTAGGCGACCCAGACA 2496 TATAATTGTCTGGGTC 4265 ATTATA GCCT 15502 15524CCCAGACAATTATACCC 2497 TGGCTAGGGTATAATT 4266 TAGCCA GTCT 15503 15525CCAGACAATTATACCCT 2498 TTGGCTAGGGTATAAT 4267 AGCCAA TGTC 15516 15538CCCTAGCCAACCCCTTA 2499 GGTGTTTAAGGGGTTG 4268 AACACC GCTA 15517 15539CCTAGCCAACCCCTTAA 2500 GGGTGTTTAAGGGGTT 4269 ACACCC GGCT 15522 15544CCAACCCCTTAAACACC 2501 GGGAGGGGTGTTTAAG 4270 CCTCCC GGGT 15526 15548CCCCTTAAACACCCCTC 2502 TGTGGGGAGGGGTGTT 4271 CCCACA TAAG 15527 15549CCCTTAAACACCCCTCC 2503 ATGTGGGGAGGGGTGT 4272 CCACAT TTAA 15528 15550CCTTAAACACCCCTCCC 2504 GATGTGGGGAGGGGT 4273 CACATC GTTTA 15537 15559CCCCTCCCCACATCAAG 2505 TTCGGGCTTGATGTGG 4274 CCCGAA GGAG 15538 15560CCCTCCCCACATCAAGC 2506 ATTCGGGCTTGATGTG 4275 CCGAAT GGGA 15539 15561CCTCCCCACATCAAGCC 2507 CATTCGGGCTTGATGT 4276 CGAATG GGGG 15542 15564CCCCACATCAAGCCCGA 2508 TATCATTCGGGCTTGA 4277 ATGATA TGTG 15543 15565CCCACATCAAGCCCGAA 2509 ATATCATTCGGGCTTG 4278 TGATAT ATGT 15544 15566CCACATCAAGCCCGAAT 2510 AATATCATCGGGCTT 4279 GATATT GATG 15554 15576CCCGAATGATATTTCCT 2511 GCGAATAGGAAATATC 4280 ATTCGC ATTC 15555 15577CCGAATGATATTTCCTA 2512 GGCGAATAGGAAATA 4281 TTCGCC TCATT 15568 15590CCTATTCGCCTACACAA 2513 GGAGAATTGTGTAGGC 4282 TTCTCC GAAT 15576 15598CCTACACAATTCTCCGA 2514 GACGGATCGGAGAATT 4283 GTGTGT GTGT 15589 15611CCGATCCGTCCCTAACA 2515 CTAGTTTGTTAGGGAC 4284 AACTAG GGAT 15594 15616CCGTCCCTAACAAACTA 2516 GCCTCCTAGTTTGTTA 4285 GGAGGC GGGA 15598 15620CCCTAACAAACTAGGAG 2517 GGACGCCTCCTAGTTT 4286 GCGTCC GTTA 15599 15621CCTAACAAACTAGGAG 2518 AGGACGCCTCCTAGTT 4287 GCGTCCT TGTT 15619 15641CCTTGCCCTATTACTAT 2519 GGATGGATAGTAATAG 4288 CCATCC GGCA 15624 15646CCCTATTACTATCCATC 2520 GATGAGGATGGATAGT 4289 CTCATC AATA 15625 15647CCTATTACTATCCATCC 2521 GGATGAGGATGGATA 4290 TCATCC GTAAT 15636 15658CCATCCTCATCCTAGCA 2522 GATTATTGCTAGGATG 4291 ATAATC AGGA 15640 15662CCTCATCCTAGCAATAA 2523 TGGGGATTATTGCTAG 4292 TCCCCA GATG 15646 15668CCTAGCAATAATCCCCA 2524 GGAGGATGGGGATTAT 4293 TCCTCC TGCT 15658 15680CCCCATCCTCCATATAT 2525 GTTTGGATATATGGAG 4294 CCAAAC GATG 15659 15681CCCATCCTCCATATATC 2526 TGTTTGGATATATGGA 4295 CAAACA GGAT 15660 15682CCATCCTCCATATATCC 2527 TTGTTTGGATATATGG 4296 AAACAA AGGA 15664 15686CCTCCATATATCCAAAC 2528 TTTGTTGTTTGGATAT 4297 AACAAA ATGG 15667 15689CCATATATCCAAACAAC 2529 TGCTTTGTTGTTTGGAT 4298 AAAGCA ATA 15675 15697CCAAACAACAAAGCAT 2530 AAATATTATGCTTTGT 4299 AATATTT TGTT 15700 15722CCCACTAAGCCAATCAC 2531 AATAAAGTGATTGGCT 4300 TTTATT TAGT 15701 15723CCACTAACCAATCACT 2532 CAATAAAGTGATTGGC 4301 TTATTG TTAG 15709 15731CCAATCACTTTATTGAC 2533 CTAGGAGTCAATAAAG 4302 TCCTAG TGAT 15727 15749CCTAGCCGCAGACCTCC 2534 GAATGAGGAGGTCTGC 4303 TCATTC GGCT 15732 15754CCGCAGACCTCCTCATT 2535 GGTTAGAATGAGGAG 4304 CTAACC GTCTG 15739 15761CCTCCTCATTCTAACCT 2536 CGATTCAGGTTAGAAT 4305 GAATCG GAGG 15742 15764CCTCATTCTAACCTGAA 2537 CTCCGATTCAGGTTAG 4306 TCGGAG AATG 15753 15775CCTGAATCGGAGGACA 2538 TACTGGTTGTCCTCCG 4307 ACCAGTA ATTC 15770 15792CCAGTAAGCTACCCTTT 2539 ATGGTAAAAGGGTAG 4308 TACCAT CTTAC 15781 15803CCCTTTTACCATCATTG 2540 CTTGTCCAATGATGGT 4309 GACAAG AAAA 15782 15804CCTTTTACCATCATTGG 2541 ACTTGTCCAATGATGG 4310 ACAAGT TAAA 15789 15811CCATCATTGGACAAGTA 2542 GGATGCTACTTGTCCA 4311 GCATCC ATGA 15810 15832CCGTACTATACTTCACA 2543 GATTGTTGTGAAGTAT 4312 ACAATC AGTA 15832 15854CCTAATCCTAATACCAA 2544 AGATAGTTGGTATTAG 4313 CTATCT GATT 15838 15860CCTAATACCAACTATCT 2545 TTAGGGAGATAGTTGG 4314 CCCTAA TATT 15845 15867CCAACTATCTCCCTAAT 2546 TTTTCAATTAGGGAGA 4315 TGAAAA TAGT 15855 15877CCCTAATTGAAAACAAA 2547 GAGTATTTTGTTTTCA 4316 ATACTC ATTA 15856 15878CCTAATTGAAAACAAAA 2548 TGAGTATTTTGTTTTCA 4317 TACTCA ATT 15885 15907CCTGTCCTTGTAGTATA 2549 TTAGTTTATACTACAA 4318 AACTAA GGAC 15890 15912CCTTGTAGTATAAACTA 2550 GTGTATTAGTTTATAC 4319 ATACAC TACA 15912 15934CCAGTCTTGTAAACCGG 2551 TCATCTCCGGTTTACA 4320 AGATGA AGAC 15925 15947CCGGAGATGAAAACCTT 2552 TGGAAAAAGGTTTTCA 4321 TTTCCA TCTC 15938 15960CCTTTTTCCAAGGACAA 2553 TCTGATTTGTCCTTGG 4322 ATCAGA AAAA 15945 15967CCAAGGACAAATCAGA 2554 CTTTTTCTCTGATTTGT 4323 GAAAAAG CCT 15977 15999CCACCATTAGCACCCAA 2555 TTAGCTTTGGGTGCTA 4324 AGCTAA ATGG 15980 16002CCATTAGCACCCAAAGC 2556 ATCTTAGCTTTGGGTG 4325 TAAGAT CTAA 15989 16011CCCAAAGCTAAGATTCT 2557 TAAATTAGAATCTTAG 4326 AATTTA CTTT 15990 16012CCAAAGCTAAGATTCTA 2558 TTAAATTAGAATCTTA 4327 ATTTAA GCTT 16052 16074CCACCCAAGTATTGACT 2559 TGGGTGAGTCAATACT 4328 CACCCA TGGG 16055 16077CCCAAGTATTGACTCAC 2560 TGATGGGTGAGTCAAT 4329 CCATCA ACTT 16056 16078CCAAGTATTGACTCACC 2561 TTGATGGGTGAGTCAA 4330 CATCAA TACT 16071 16093CCCATCAACAACCGCTA 2562 AATACATAGCGGTTGT 4331 TGTATT TGAT 16072 16094CCATCAACAACCGCTAT 2563 AAATACATAGCGGTTG 4332 GTATTT TTGA 16082 16104CCGCTATGTATTTCGTA 2564 GTAATGTACGAAATAC 4333 CATTAC ATAG 16107 16129CCAGCCACCATGAATAT 2565 CGTACAATATTCATGG 4334 TGTACG TGGC 16111 16133CCACCATGAATATTGTA 2566 GTACCGTACAATATTC 4335 CGGTAC ATGG 16114 16136CCATGAATATTGTACGG 2567 ATGGTACCGTACAATA 4336 TACCAT TTCA 16133 16155CCATAAATACTTGACCA 2568 TACAGGTGGTCAAGTA 4337 CCTGTA TTTA 16147 16169CCACCTGTAGTACATAA 2569 GGGTTTTTATGTACTA 4338 AAACCC CAGG 16150 16172CCTGTAGTACATAAAAA 2570 ATTGGGTTTTTATGTA 4339 CCCAAT CTAC 16167 16189CCCAATCCACATCAAAA 2571 AGGGGGTTTTGATGTG 4340 CCCCCT GATT 16168 16190CCAATCCACATCAAAAC 2572 GAGGGGGTTTTGATGT 4341 CCCCTC GGAT 16173 16195CCACATCAAAACCCCCT 2573 ATGGGGAGGGGGTTTT 4342 CCCCAT GATG 16184 16206CCCCCTCCCCATGCTTA 2574 TGCTTGTAAGCATGGG 4343 CAAGCA GG 16185 16207CCCCTCCCCATGCTTAC 2575 TTGCTTGTAAGCATGG 4344 AAGCAA GGAG 16186 16208CCCTCCCCATGCTTACA 2576 CTTGCTTGTAAGCATG 4345 AGCAAG GGGA 16187 16209CCTCCCCATGCTTACAA 2577 ACTTGCTTGTAAGCAT 4346 GCAAGT GGGG 16190 16212CCCCATGCTTACAAGCA 2578 TGTACTTGCTTGTAAG 4347 AGTACA CATG 16191 16213CCCATGCTTACAAGCAA 2579 CTGTACTTGCTTGTAA 4348 GTACAG GCAT 16192 16214CCATGCTTACAAGCAAG 2580 GCTGTACTTGCTTGTA 4349 TACAGC AGCA 16221 16243CCCTCAACTATCACACA 2581 AGTTGATGTGTGATAG 4350 TCAACT TTGA 16222 16244CCTCAACTATCACACAT 2582 CAGTTGATGTGTGATA 4351 CAACTG GTTG 16250 16272CCAAAGCCACCCCTCAC 2583 TAGTGGGTGAGGGGTG 4352 CCACTA GCTT 16256 16278CCACCCCTCACCCACTA 2584 GTATCCTAGTGGGTGA 4353 GGATAC GGGG 16259 16281CCCCTCACCCACTAGGA 2585 TTGGTATCCTAGTGGG 4354 TACCAA TGAG 16260 16282CCCTCACCCACTAGGAT 2586 GTTGGTATCCTAGTGG 4355 ACCAAC GTGA 16261 16283CCTCACCCACTAGGATA 2587 TGTTGGTATCCTAGTG 4356 CCAACA GGTG 16266 16288CCCACTAGGATACCAAC 2588 AGGTTTGTTGGTATCC 4357 AAACCT TAGT 16267 16289CCACTAGGATACCAACA 2589 TAGGTTTGTTGGTATC 4358 AACCTA CTAG 16278 16300CCAACAAACCTACCCAC 2590 TTAAGGGTGGGTAGGT 4359 CCTTAA TTGT 16286 16308CCTACCCACCCTTAACA 2591 ATGTACTGTTAAGGGT 4360 GTACAT GGGT 16290 16312CCCACCCTTAACAGTAC 2592 TACTATGTACTGTTAA 4361 ATAGTA GGGT 16291 16313CCACCCTTAACAGTACA 2593 GTACTATGTACTGTTA 4362 TAGTAC AGGG 16294 16316CCCTTAACAGTACATAG 2594 TATGTACTATGTACTG 4363 TACATA TTAA 16295 16317CCTTAACAGTACATAGT 2595 TTATGTACTATGTACT 4364 ACATAA GTTA 16320 16342CCATTTACCGTACATAG 2596 AATGTGCTATGTACGG 4365 CACATT TAAA 16327 16349CCGTACATAGCACATTA 2597 TGACTGTAATGTGCTA 4366 CAGTCA TGTA 16353 16375CCCTTCTCGTCCCCATG 2598 GTCATCCATGGGGACG 4367 GATGAC AGAA 16354 16376CCTTCTCGTCCCCATGG 2599 GGTCATCCATGGGGAC 4368 ATGACC GAGA 16363 16385CCCCATGGATGACCCCC 2600 TCTGAGGGGGGTCAT 4369 CTCAGA CATG 16364 16386CCCATGGATGACCCCCC 2601 ATCTGAGGGGGGTCAT 4370 TCAGAT CCAT 16365 16387CCATGGATGACCCCCCT 2602 TATCTGAGGGGGGTCA 4371 CAGATA TCCA 16375 16397CCCCCCTCAGATAGGGG 2603 AAGGGACCCCTATCTG 4372 TCCCTT AGGG 16376 16398CCCCCTCAGATAGGGGT 2604 CAAGGGACCCCTATCT 4373 CCCTTG GAGG 16377 16399CCCCTCAGATAGGGGTC 2605 TCAAGGGACCCCTATC 4374 CCTTGA TGAG 16378 16400CCCTCAGATAGGGGTCC 2606 GTCAAGGGACCCCTAT 4375 CTTGAC CTGA 16379 16401CCTCAGATAGGGGTCCC 2607 GGTCAAGGGACCCCTA 4376 TTGACC TCTG 16393 16415CCCTTGACCACCATCCT 2608 TCACGGAGGATGGTGG 4377 CCGTGA TCAA 16394 16416CCTTGACCACCATCCTC 2609 TTCACGGAGGATGGTG 4378 CGTGAA GTCA 16400 16422CCACCATCCTCCGTGAA 2610 ATTGATTTCACGGAGG 4379 ATCAAT ATGG 16403 16425CCATCCTCCGTGAAATC 2611 GATATTGATTTCACGG 4380 AATATC AGGA 16407 16429CCTCCGTGAAATCAATA 2612 GCGGGATATTGATTTC 4381 TCCCGC ACGG 16410 16432CCGTGAAATCAATATCC 2613 TGTGCGGGATATTGAT 4382 CGCACA TTCA 16425 16447CCCGCACAAGAGTGCTA 2614 GGAGAGTAGCACTCTT 4383 CTCTCC GTGC 16426 16448CCGCACAAGAGTGCTAC 2615 AGGAGAGTAGCACTCT 4384 TCTCCT TGTG 16446 16468CCTCGCTCCGGGCCCAT 2616 AGTGTTATGGGCCCGG 4385 AACACT AGCG 16453 16475CCGGGCCCATAACACTT 2617 ACCCCCAAGTGTTATG 4386 GGGGGT GGCC 16458 16480CCCATAACACTTGGGGG 2618 TAGCTACCCCCAAGTG 4387 TAGCTA TTAT 16459 16481CCATAACACTTGGGGGT 2619 TTAGCTACCCCCAAGT 4388 AGCTAA GTTA 16494 16516CCGACATCTGGTTCCTA 2620 CTGAAGTAGGAACCA 4389 CTTCAG GATGT 16507 16529CCTACTTCAGGGTCATA 2621 AGGCTTTATGACCCTG 4390 AAGCCT AAGT 16527 16549CCTAAATAGCCCACACG 2622 GGGGAACGTGTGGGCT 4391 TTCCCC ATTT 16536 16558CCCACACGTTCCCCTTA 2623 CTTATTTAAGGGGAAC 4392 AATAAG GTGT 16537 16559CCACACGTTCCCCTTAA 2624 TCTTATTTAAGGGGAA 4393 ATAAGA CGTG 16546 16568CCCCTTAAATAAGACAT 2625 ATCGTGATGTCTTATT 4394 CACGAT TAAG 16547 16569CCCTTAAATAAGACATC 2626 CATCGTGATGTCTTAT 4395 ACGATG TT 16548 16570CCTTAAATAAGACATCA 2627 CCATCGTGATGTCTTA 4396 CGATGG TTTA

Applications

The gNAs (e.g., gRNAs) and collections of gNAs (e.g., gRNAs) providedherein are useful for a variety of applications, including depletion,partitioning, capture, or enrichment of target sequences of interest,genome-wide labeling; genome-wide editing, genome-wide function screens;and genome-wide regulation.

In one embodiment, the gNAs are selective for host nucleic acids inabiological sample from a host, but are not selective for non-hostnucleic acids in the sample from a host. In one embodiment, the gNAs areselective for non-host nucleic acids from a biological sample from ahost but are not selective for the host nucleic acids in the sample. Inone embodiment, the gNAs are selective for both host nucleic acids and asubset of the non-host nucleic acids in abiological sample from a host.For example, where a complex biological sample comprises host nucleicacids and nucleic acids from more than one non-host organisms, the gRNAsmay be selective for more than one of the non-host species. In suchembodiments, the gNAs are used to serially deplete or partition thesequences that are not of interest. For example, saliva from a humancontains human DNA, as well as the DNA of more than one bacterialspecies, but may also contain the genomic material of an unknownpathogenic organism. In such an embodiment, gNAs directed at the humanDNA and the known bacteria can be used to serially deplete the humanDNA, and the DNA of the known bacterial, thus resulting in a samplecomprising the genomic material of the unknown pathogenic organism.

In an exemplary embodiment, the gNAs are selective for human host DNAobtained from a biological sample from the host, but do not hybridizewith DNA from an unknown pathogen(s) also obtained from the sample.

In some embodiments, the gNAs are useful for depleting and partitioningof targeted sequences in a sample, enriching a sample for non-hostnucleic acids, or serially depleting targeted nucleic acids in a samplecomprising: providing nucleic acids extracted from a sample; andcontacting the sample with a plurality of complexes comprising (i) anyone of the collection of gNAs described herein and (ii) nucleicacid-guided nuclease (e.g., CRISPR/Cas) system proteins.

In some embodiments, the gNAs are useful for method of depletion andpartitioning of targeted sequences in a sample comprising: providingnucleic acids extracted from a sample, wherein the extracted nucleicacids comprise sequences of interest and targeted sequences for one ofdepletion and partitioning; contacting the sample with a plurality ofcomplexes comprising (i) a collection of gNAs provided herein; and (ii)nucleic acid-guided nuclease (e.g., CRISPR/Cas) system proteins, underconditions in which the nucleic acid-guided nuclease system proteinscleave the nucleic acids in the sample.

In some embodiments, the gNAs are useful for enriching a sample fornon-host nucleic acids comprising: providing a sample comprising hostnucleic acids and non-host nucleic acids; contacting the sample with aplurality of complexes comprising (i) a collection of gNAs providedherein comprising targeting sequences directed at the host nucleicacids; and (ii) nucleic acid-guided nuclease (e.g., CRISPR/Cas) systemproteins, under conditions in which the nucleic acid-guided nucleasesystem proteins cleave the host nucleic acids in the sample, therebydepleting the sample of host nucleic acids, and allowing for theenrichment of non-host nucleic acids.

In some embodiments, the gNAs are useful for one method for seriallydepleting targeted nucleic acids in a sample comprising: providing abiological sample from a host comprising host nucleic acids and non-hostnucleic acids, wherein the non-host nucleic acids comprise nucleic acidsfrom at least one known non-host organism and nucleic acids from anunknown non-host organism; providing a plurality of complexes comprising(i) a collection of gNAs provided herein, directed at the host nucleicacids; and (ii) nucleic acid-guided nuclease (e.g., CRISPR/Cas) systemproteins; mixing the nucleic acids from the biological sample with thegNA-nucleic acid-guided nuclease system protein complexes (e.g.,gRNA-CRISPR/Cas system protein complexes) configured to hybridize totargeted sequences in the host nucleic acids, wherein at least a portionof the complexes hybridizes to the targeted sequences in the hostnucleic acids, and wherein at least a portion of the host nucleic acidsare cleaved; mixing the remaining nucleic acids from the biologicalsample with the gNA-nucleic acid-guided nuclease system proteincomplexes configured to hybridize to targeted sequences in the at leastone known non-host nucleic acids, wherein at least a portion of thecomplexes hybridizes to the targeted sequences in the at least onenon-host nucleic acids, and wherein at least a portion of the non-hostnucleic acids are cleaved; and isolating the remaining nucleic acidsfrom the unknown non-host organism and preparing for further analysis.

In some embodiments, the gNAs generated herein are used to performgenome-wide or targeted functional screens in a population of cells. Insuch an embodiment, libraries of in vitro-transcribed gNAs (e.g., gRNAs)or vectors encoding the gNAs can be introduced into a population ofcells via transfection or other laboratory techniques known in the art,along with a nucleic acid-guided nuclease (e.g., CRISPR/Cas) systemprotein, in a way that gNA-directed nucleic acid-guided nuclease systemprotein editing can be achieved to sequences across the entire genome orto a specific region of the genome. In one embodiment, the nucleicacid-guided nuclease system protein can be introduced as a DNA. In oneembodiment, the nucleic acid-guided nuclease system protein can beintroduced as mRNA. In one embodiment, the nucleic acid-guided nucleasesystem protein can be introduced as protein. In one exemplaryembodiment, the nucleic acid-guided nuclease system protein is Cas9.

In some embodiments, the gNAs generated herein are used for theselective capture and/or enrichment of nucleic acid sequences ofinterest. For example, in some embodiments, the gNAs generated hereinare used for capturing target nucleic acid sequences comprising:providing a sample comprising a plurality of nucleic acids; andcontacting the sample with a plurality of complexes comprising (i) acollection of gNAs provided herein; and (ii) nucleic acid-guidednuclease (e.g., CRISPR/Cas) system proteins. Once the sequences ofinterest are captured, they can be further ligated to create, forexample, a sequencing library.

In some embodiments, the gNAs generated herein are used for introducinglabeled nucleotides at targeted sites of interest comprising: (a)providing a sample comprising a plurality of nucleic acid fragments; (b)contacting the sample with a plurality of complexes comprising (i) acollection of gNAs provided herein; and (ii) nucleic acid-guidednuclease (e.g., CRISPR/Cas) system protein-nickases (e.g.Cas9-nickases), wherein the gNAs are complementary to targeted sites ofinterest in the nucleic acid fragments, thereby generating a pluralityof nicked nucleic acid fragments at the targeted sites of interest; and(c) contacting the plurality of nicked nucleic acid fragments with anenzyme capable of initiating nucleic acid synthesis at a nicked site,and labeled nucleotides, thereby generating a plurality of nucleic acidfragments comprising labeled nucleotides in the targeted sites ofinterest.

In some embodiments, the gNAs generated herein are used for capturingtarget nucleic acid sequences of interest comprising: (a) providing asample comprising a plurality of adapter-ligated nucleic acids, whereinthe nucleic acids are ligated to a first adapter at one end and areligated to a second adapter at the other end; and (b) contacting thesample with a collection of gNAs which comprise a plurality of deadnucleic acid-guided nuclease-gNA complexes (e.g., dCas9-gRNA complexes),wherein the dead nucleic acid-guided nuclease (e.g., dCas9) is fused toa transposase, wherein the gNAs are complementary to targeted sites ofinterest contained in a subset of the nucleic acids, and wherein thedead nucleic acid-guided nuclease-gNA transposase complexes (e.g.,dCas9-gRNA transposase complexes) are loaded with a plurality of thirdadapters, to generate a plurality of nucleic acids fragments comprisingeither a first or second adapter at one end and a third adapter at theother end. In one embodiment the method further comprises amplifying theproduct of step (b) using first or second adapter and thirdadapter-specific PCR.

In some embodiments, the gNAs generated herein are used to performgenome-wide or targeted activation or repression in a population ofcells. In such an embodiment, libraries of in vitro-transcribed gNAs(e.g., gRNAs) or vectors encoding the gNAs can be introduced into apopulation of cells via transfection or other laboratory techniquesknown in the art, along with a catalytically dead nucleic acid-guidednuclease (e.g., CRISPR/Cas) system protein fused to an activator orrepressor domain (catalytically dead nucleic acid-guided nuclease systemprotein-fusion protein), in a way that gNA-directed catalytically deadnucleic acid-guided nuclease system protein-mediated activation orrepression can be achieved at sequences across the entire genome or to aspecific region of the genome. In one embodiment, the catalytically deadnucleic acid-guided nuclease system protein-fusion protein can beintroduced as DNA. In one embodiment, the catalytically dead nucleicacid-guided nuclease system protein-fusion protein can be introduced asmRNA. In one embodiment, the catalytically dead nucleic acid-guidednuclease system protein-fusion protein can be introduced as protein. Insome embodiments, the collection of gNAs or nucleic acids encoding forgNAs exhibit specificity for more than one nucleic acid-guided nucleasesystem protein. In one exemplary embodiment, the catalytically deadnucleic acid-guided nuclease system protein is dCas9.

In some embodiments, the collection comprises gRNAs or nucleic acidsencoding for gRNAs with specificity for Cas9 and one or more CRISPR/Cassystem proteins selected from selected from the group consisting ofCpf1, Cas3, Cas8a-c, Cas10, Cse1, Csy1, Csn2, Cas4, Csm2, and Cm5. Insome embodiments, the collection comprises gRNAs or nucleic acidsencoding for gRNAs with specificity for various catalytically deadCRISPR/Cas system proteins fused to different fluorophores, for examplefor use in the labeling and/or visualization of different genomes orportions of genomes, for use in the labeling and/or visualization ofdifferent chromosomal regions, or for use in the labeling and/orvisualization of the integration of viral genes/genomes into a genome.

In some embodiments, the collection of gNAs (or nucleic acids encodingfor gNAs) have specificity for different nucleic acid-guided nuclease(e.g., CRISPR/Cas) system proteins, and target different sequences ofinterest, for example from different species. For example, a firstsubset of gNAs from a collection of gNAs (or transcribed from apopulation of nucleic acids encoding such gNAs) targeting a genome froma first species can be first mixed with a first nucleic acid-guidednuclease system protein member (or an engineered version); and a secondsubset of gNAs from a collection of gNAs (or transcribed from apopulation of nucleic acids encoding such gNAs) targeting a genome froma second species can be mixed with a second different nucleicacid-guided nuclease system protein member (or an engineered version).In one embodiment, the nucleic acid-guided nuclease system proteins canbe a catalytically dead version (for example dCas9) fused with differentfluorophores, so that different targeted sequence of interest, e.g.different species genome, or different chromosomes of one species, canbe labeled by different fluorescent labels. For example, differentchromosomal regions can be labeled by different gRNA-targeteddCas9-fluorophores, for visualization of genetic translocations. Forexample, different viral genomes can be labeled by differentgRNA-targeted dCas9-fluorophores, for visualization of integration ofdifferent viral genomes into the host genome. In another embodiment, thenucleic acid-guided nuclease system protein can be dCas9 fused witheither activation or repression domain, so that different targetedsequence of interest, e.g. different chromosomes of a genome, can bedifferentially regulated. In another embodiment, the nucleic acid-guidednuclease system protein can be dCas9 fused different protein domainwhich can be recognized by different antibodies, so that differenttargeted sequence of interest, e.g. different DNA sequences within asample mixture, can be differentially isolated.

Exemplary Compositions of the Invention

In one embodiment, provided herein is a composition comprising a nucleicacid fragment, a nickase nucleic acid-guided nuclease-gNA complex, andlabeled nucleotides. In one exemplary embodiment, provided herein is acomposition comprising a nucleic acid fragment, a nickase Cas9-gRNAcomplex, and labeled nucleotides. In such embodiments, the nucleic acidmay comprise DNA. The nucleotides can be labeled, for example withbiotin. The nucleotides can be part of an antibody-conjugate pair.

In one embodiment, provided herein is a composition comprising a nucleicacid fragment and a catalytically dead nucleic acid-guided nuclease-gNAcomplex, wherein the catalytically dead nucleic acid-guided nuclease isfused to a transposase. In one exemplary embodiment, provided herein isa composition comprising a DNA fragment and a dCas9-gRNA complex,wherein the dCas9 is fused to a transposase.

In one embodiment, provided herein is a composition comprising a nucleicacid fragment comprising methylated nucleotides, a nickase nucleicacid-guided nuclease-gNA complex, and unmethylated nucleotides. In anexemplary embodiment, provided herein is a composition comprising a DNAfragment comprising methylated nucleotides, a nickase Cas9-gRNA complex,and unmethylated nucleotides.

In one embodiment, provided herein is a gDNA complexed with a nucleicacid-guided-DNA endonuclease. In an exemplary embodiment, the nucleicacid-guided-DNA endonuclease is NgAgo.

In one embodiment, provided herein is a gDNA complexed with a nucleicacid-guided-RNA endonuclease.

In one embodiment, provided herin is a gRNA complexed with a nucleicacid-guided-DNA endonuclease.

In one embodiment, provided herein is a gRNA complexed with a nucleicacid-guided-RNA endonuclease. In one embodiment, the nucleicacid-guided-RNA endonuclease comprises C2c2.

Kits and Articles of Manufacture

The present application provides kits comprising any one or more of thecompositions described herein, not limited to adapters, gNAs (e.g.,gRNAs), gNA collections (e.g., gRNA collections), nucleic acid moleculesencoding the gNA collections, and the like.

In one exemplary embodiment, the kit comprises a collection of DNAmolecules capable of transcribing into a library of gRNAs wherein thegRNAs are targeted to human genomic or other sources of DNA sequences.

In one embodiment, the kit comprises a collection of gNAs wherein thegNAs are targeted to human genomic or other sources of DNA sequences.

In some embodiments, provided herein are kits comprising any of thecollection of nucleic acids encoding gNAs, as described herein. In someembodiments, provided herein are kits comprising any of the collectionof gNAs, as described herein.

The present application also provides all essential reagents andinstructions for carrying out the methods of making the gNAs and thecollection of nucleic acids encoding gNAs, as described herein. In someembodiments, provided herein are kits that comprise all essentialreagents and instructions for carrying out the methods of makingindividual gNAs and collections of gNAs as described herein.

Also provided herein is computer software monitoring the informationbefore and after contacting a sample with a gNA collection producedherein. In one exemplary embodiment, the software can compute and reportthe abundance of non-target sequence in the sample before and afterproviding gNA collection to ensure no off-target targeting occurs, andwherein the software can check the efficacy oftargeted-depletion/encrichment/capture/partitioning/labeling/regulation/editingby comparing the abundance of the target sequence before and afterproviding gNA collection to the sample.

The following examples are included for illustrative purposes and arenot intend to limit the scope of the invention.

Examples Example 1: Construction of a gRNA Library from a T7 PromoterHuman DNA Library T7 Promoter Library Construction

Human genomic DNA (400 ng) was fragmented using an S2 Covaris sonicator(Covaris) for 8 cycles, to yield fragments of 200-300 bp in length.Fragmented DNA was repaired using the NEBNext End Repair Module (NEB)and incubated at 25° C. for 30 min, then heat inactivated at 75° C. for20 min. To make T7 promoter adapters, oligos T7-1(5′GCCTCGAGC*T*A*ATACGACTCACTATAGAG3′, * denotes a phosphorothioatebackbone linkage)(SEQ ID NO: 4397) and T7-2 (sequence5′Phos-CTCTATAGTGAGTCGTATTA3′) (SEQ ID NO: 4398) were admixed at 15 μM,heated to 98° C. for 3 min then cooled slowly (0.1° C./min) to 30° C. T7promoter blunt adapters (15 pmol total) were then added to theblunt-ended human genomic DNA fragments, and incubated with Blunt/TALigase Master Mix (NEB) at 25° C. for 30 min ((2) in FIG. 1). Ligationswere amplified with 2 μM oligo T7-1, using Hi-Fidelity 2× Master Mix(NEB) for 10 cycles of PCR (98° C. for 20 s, 63° C. for 20 s, 72° C. for35 s). Amplification was verified by running a small aliquot on agarosegel electrophoresis. PCR amplified products were recovered using0.6×AxyPrep beads (Axygen) according to the manufacturer's instructions,and resuspended in 15 μL of 10 mM Tris-HCl pH 8.

Digestion of DNA

PCR amplified T7 promoter DNA (2 μg total per digestion) was digestedwith 0.1 μL of Nt.CviPII (NEB) in 10 μL of NEB buffer 2 (50 mM NaCl, 10mM Tris-HCl pH 7.9, 10 mM MgCl₂, 100 μg/mL BSA) for 10 min at 37° C.((3) in FIG. 1), then heat inactivated at 75° C. for 20 min. Anadditional 10 μL of NEB buffer 2 with 1 μL of T7 Endonuclease I (NEB)was added to the reaction, and incubated at 37° C. for 20 min ((4) inFIG. 1). Enzymatic digestion of DNA was verified by agarose gelelectrophoresis. Digested DNA was recovered by adding 0.6×AxyPrep beads(Axygen), according to the manufacturer's instructions, and resuspendedin 15 μL of 10 mM Tris-HCl pH 8.

Ligation of Adapters and Removal of HGG

DNA was then blunted using T4 DNA Polymerase (NEB) for 20 min at 25° C.,followed by heat inactivation at 75° C. for 20 min ((5) in FIG. 1).

To make MlyI adapters, oligos MlyI-1 (sequence 5′>3′,5′Phos-GGGACTCGGATCCCTATAGTGATACAAAGACGATGACGACAAGCG) (SEQ ID NO: 4399)and MlyI-2 (sequence 5′>3′, TCACTATAGGGATCCGAGTCCC) (SEQ ID NO: 4400)were admixed at 15 μM, heated to 98° C. for 3 min then cooled slowly(0.1° C./min) to 30° C. MlyI adapters (15 pmol total) were then added toT4 DNA Polymerase-blunted DNA, and incubated with Blunt/TA Ligase MasterMix (NEB) at 25° C. for 30 min ((6) in FIG. 1). Ligations were heatinactivated at 75° C. for 20 min, then digested with MlyI and XhoI (NEB)for 1 hr at 37° C., so that HGG motifs are eliminated ((7) in FIG. 1).Digests were then cleaned using 0.8×AxyPrep beads (Axygen), and DNA wasresuspended in 10 μL of 10 mM Tris-Cl pH 8.

To make StlgR adapters, oligos stlgR (sequence 5′>3′,5′Phos-GTTTTAGAGCTAGAAATAGCAAGTTAAAATAAGGCTAGTCCGTTATCAACTTGAAAAAGTGGCACCGAGTCGGTGCTTTTTTTGGATCCGATGC) (SEQ ID NO: 4401) and stlgRev (sequence5′>3′,GGATCCAAAAAAAGCACCGACTCGGTGCCACUITTITCAAGTTGATAACGGACTAGCCTTATTTTAACTTGCTATTTCTAGCTCTAAAAC) (SEQ ID NO: 4402) were admixed at 15 μM, heatedto 98° C. for 3 min then cooled slowly (0.1° C./min) to 60° C. StlgRadapters (5 pmol total) were added to HGG-removed DNA fragments, andincubated with Blunt/TA Ligase Master Mix (NEB) at 25° C. for 30 min((8) in FIG. 1). Ligations were then incubated with Hi-Fidelity 2×Master Mix (NEB), using 2 μM of both oligos T7-1 and gRU (sequence5′>3′, AAAAAAAGCACCGACTCGGTG) (SEQ ID NO: 4403), and amplified using 20cycles of PCR (98° C. for 20 s, 60° C. for 20 s, 72° C. for 35 s).Amplification was verified by running a small aliquot on agarose gelelectrophoresis. PCR amplified products were recovered using 0.6×AxyPrepbeads (Axygen) according to the manufacturer's instructions, andresuspended in 15 μL of 10 mM Tris-HCl pH 8.

In Vitro Transcription

The T7/gRU amplified library of PCR products was then used as templatefor in vitro transcription, using the HiScribe T7 In Vitro TranscriptionKit (NEB). 500-1000 ng of template was incubated overnight at 37° C.according to the manufacturer's instructions. To transcribe the guidelibraries into gRNAs, the following in vitro transcription reactionmixture was assembled: 10 μL of purified library (˜500 ng), 6.5 μL ofH₂O, 2.25 μL of ATP, 2.25 μL of CTP, 2.25 μL of GTP, 2.25 μL of UTP,2.25 μL of 10× reaction buffer (NEB) and 2.25 μL of T7 RNA Polymerasemix. The reaction was incubated at 37° C. for 24 hr, then purified usingthe RNA cleanup kit (Life Technologies), eluted with 100 μL ofRNase-free water, quantified and stored at −20° C. until use.

Example 2: Construction of gRNA Library from Intact Human Genomic DNADigestion of DNA

Human genomic DNA ((1) in FIG. 2; 20 μg total per digestion) wasdigested with 0.1 μL of Nt.CviPII (NEB) in 40 μL of NEB buffer 2 (50 mMNaCl, 10 mM Tris-HC pH 7.9, 10 mM MgCl₂, 100 μg/mL BSA) for 10 min at37° C., then heat inactivated at 75° C. for 20 min. An additional 40 μLof NEB buffer 2 and 1 μL of T7 Endonuclease I (NEB) was added to thereaction, with 20 min incubation at 37° C. (e.g., (2) in FIG. 2).Fragmentation of genomic DNA was verified with a small aliquot byagarose gel electrophoresis. DNA fragments between 200 and 600 bp wererecovered by adding 0.3×AxyPrep beads (Axygen), incubating at 25° C. for5 min, capturing beads on a magnetic stand and transferring thesupernatant to a new tube. DNA fragments below 600 bp do not bind tobeads at this bead/DNA ratio and remain in the supernatant. 0.7×AxyPrepbeads (Axygen) were then added to the supernatant (this will bind allDNA molecules longer than 200 bp), allowed to bind for 5 min. Beads werecaptured on a magnetic stand and washed twice with 80% ethanol, airdried. DNA was then resuspended in 15 μL of 10 mM Tris-HCl pH 8. DNAconcentration was determined using a Qbit assay (Life Technologies).

Ligation of Adapters

To make T7/MlyI adapters, oligos MlyI-1 (sequence 5′>3′,5′Phos-GGGGGACTCGGATCCCTATAGTGATACAAAGACGATGACGACAAGCG) (SEQ ID NO:4404) and T7-7 (sequence 5′>3′,GCCTCGAGC*T*A*ATACGACTCACTATAGGGATCCAAGTCCC, * denotes aphosphorothioate backbone linkage) (SEQ ID NO: 4405) were admixed at 15μM, heated to 98° C. for 3 min then cooled slowly (0.1° C./min) to 30°C. The purified, Nt.CviPII/T7 Endonuclease I digested DNA (100 ng) wasthen ligated to 15 pmol of T7/MlyI adapters using Blunt/TA Ligase MasterMix (NEB) at 25° C. for 30 min ((3) in FIG. 2). Ligations were thenamplified by 10 cycles of PCR (98° C. for 20 s, 60° C. for 20 s, 72° C.for 35 s) using Hi-Fidelity 2× Master Mix (NEB), and 2 μM of both oligosT7-17 (GCCTCGAGC*T*A*ATACGACTCACTATAGGG * denotes a phosphorothioatebackbone linkage) (SEQ ID NO: 4406) and Flag (sequence 5′>3′,CGCTTGTCGTCATCGTCTTTGTA) (SEQ ID NO: 4407). PCR amplification increasesthe yield of DNA and, given the nature of the Y-shaped adapters we used,always resulted in T7 promoter being added distal to the HGG site andMlyI site being added next to the HGG motif ((4) in FIG. 2).

PCR products were then digested with MlyI and XhoI (NEB) for 1 hr at 37°C., and heat inactivated at 75° C. for 20 min ((5) in FIG. 2). Followingthat, 5 pmol of adapter StlgR (in Example 1) was ligated using Blunt/TALigase Master Mix (NEB) at 25° C. for 30 min ((6) in FIG. 2). Ligationswere then amplified by PCR using Hi-Fidelity 2× Master Mix (NEB), 2 μMof both oligos T7-7 and gRU (in Example 1) and 20 cycles of PCR (98° C.for 20 s, 60° C. for 20 s, 72° C. for 35 s). Amplification was verifiedby running a small aliquot on agarose gel electrophoresis. PCR amplifiedproducts were recovered using 0.6×AxyPrep beads (Axygen) according tothe manufacturer's instructions, and resuspended in 15 μL of 10 mMTris-HCl pH 8.

Samples were then used as templates for in vitro transcription reactionas described in Example 1.

Example 3: Direct Cutting with CviPII

30 μg of human genomic DNA was digested with 2 units of NtCviPII (NewEngland Biolabs) for 1 hour at 37° C., followed by heat inactivation at75° C. for 20 minutes. The size of the fragments was verified to be200-1,000 base pairs using a fragment analyzer instrument (AdvancedAnalytical). The 5′ or 3′ protruding ends (as shown, for example, inFIG. 3) were converted to blunt ends by adding 100 units of T4 DNApolymerase (New England Biolabs), 100 μM dNTPs and incubating at 12° C.for 30 minutes. DNA was then recovered using a PCR cleanup kit (Zymo)and eluted in 20 μL elution buffer. The DNA was then ligated to MlyIadapter (see, for example, Example 4) or BaeI/EcoP15I adapters (see, forexample, Example 4) or BaeI/EcoP15I adapters (see, for example, Example5)

Example 4: Use of MlyI Adapter

Adapter MlyI was made by combining 2 μmoles of MlyI Ad1 and MlyAd2 in 40μL water. Adapter BsaXI/MmeI was made by combining 2 μmoles oligoBsMm-Ad1 and 2 μmoles oligo BsMm-Ad2 in 40 μL water. T7 adapter was madeby combining 1.5 μmoles of T7-Ad1 and T7-Ad2 oligos in 100 μL water.Stem-loop adapter was made by combining 1.5 μmoles of gR-top and gR-botoligos in 100 μL water. In all cases, after mixing adapters were heatedto 98° C. for 3 min then cooled to room temperature at a cooling rate of1° C./min in a thermal cycler.

TABLE 5 Oligonucleotides used with MlyI Adapter. SEQ ID Oligo NO nameSequence (5′>3′) Modification 4408 MlyI-gagatcagcttctgcattgatgccagcagcccgagtcag none Ad1 4409 MlyI-ctgactcgggctgctgtacaaagacgatgacgacaagcgtta 5′phosphate Ad2 4410 BsMm-gagatcagcttctgcattgatgcGGAGCCGCAGTACACTATCCAAC none Ad1 4411 BsMm-GTTGGATAGTGTACTGCGGCTCCtacaaagacgatgacgacaagcg 5′phosphate Ad2 4412T7-Ad1 gcctcgagctaatacgactcactatagagNN none 4398 T7-Ad2Ctctatagtgagtcgtatta 5′phosphate 4413 gR-topttagagctagaaatagcaagttaaaataaggctagtccgttatcaacttgaaaaagtggcaccgagtcggt5′phosphate gctttttt 4414 gR-botaaaaaagcaccgactcggtgccactttttcaagttgataacggactagccttattttaacttgctatttctnone agctctaaaac

The DNA containing the CCD blunt ends (from earlier section) was thenligated to 50 pmoles of adapter MlyI, using the blunt/TA ligation mastermix (New England Biolabs) at room temperature for 30 minutes. The DNAwas then recovered by incubating with 0.6× Kapa SPRI beads (KapaBiosystems) for 5 minutes, capturing the beads with a magnetic rack,washing twice with 80% ethanol, air drying the beads for 5 minutes andfinally resuspending the DNA in 50 μL buffer 4 (50 mM potassium acetate,20 mM Tris-acetate, 10 mM magnesium acetate, 100 μg/mL BSA, pH 7.9).These steps eliminate small (<100 nucleotides) DNA and MlyI adapterdimers.

Purified DNA was then digested by adding 20 units of MlyI (New EnglandBiolabs) and incubating at 37° C. for 1 hour to eliminate both theadapter derived sequences and the CCD (and complementary HGG) motifs.DNA was recovered from the digest by incubating with 0.6× Kapa SPRIbeads (Kapa Biosystems) for 5 minutes, capturing the beads with amagnetic rack, washing twice with 80% ethanol, air drying the beads for5 minutes and finally resuspending the DNA in 30 μL buffer 4.

The purified DNA was then ligated to 50 pmoles of adapter BsaXI/MmeI,using the blunt/TA ligation master mix (New England Biolabs) at roomtemperature for 30 minutes. The DNA was then recovered by incubatingwith 0.6× Kapa SPRI beads (Kapa Biosystems) for 5 minutes, capturing thebeads with a magnetic rack, washing twice with 80% ethanol, air dryingthe beads for 5 minutes and finally resuspending the DNA in 50 μL buffer4 (50 mM potassium acetate, 20 mM Tris-acetate, 10 mM magnesium acetate,100 μg/mL BSA, pH 7.9). DNA was then digested by addition of 20 unitsMmeI (New England Biolabs) and 40 pmol/μL SAM (S-adenosyl methionine) at37° C. for 1 hour, followed by heat inactivation at 75° C. for 20minutes. DNA was then ligated to 30 pmoles T7 adapter using the blunt/TAligation master mix (New England Biolabs) at room temperature for 30minutes. DNA was then recovered using a PCR cleanup kit (Zymo) andeluted in 20 μL buffer 4, then digested with 20 units of BsaXI for 1hour at 37° C. The guide RNA stem-loop sequences were added by adding 15pmoles stem-loop adapter and using the blunt/TA ligation master mix (NewEngland Biolabs) at room temperature for 30 min. DNA was then recoveredusing a PCR cleanup kit (Zymo), eluted in 20 μL elution buffer and PCRamplified using HiFidelity 2× master mix (New England Biolabs). PrimersT7-Ad1 and gRU (sequence 5′>3′ AAAAAAGCACCGACTCGGTG) (SEQ ID NO: 4419)were used to amplify with the following settings (98° C. 3 min; 98° C.for 20 sec, 60° C. for 30 secs, 72° C. for 20 sec, 30 cycles). The PCRamplicon was cleaned up using the PCR cleanup kit and verified by DNAsequencing, then used as template for an in vitro transcription reactionto generate guide RNAs.

Example 5: Use of BaeI/EcoP15I Adapter

Adapter Bae/EcoP15I was made by combining 2 μmoles of BE Ad1 and BE Ad2in 40 μL water. T7-E adapter was made by combining 1.5 μmoles of T7-Ad3and T7-Ad4 oligos in 100 μL water. In all cases, after mixing adapterswere heated to 98° C. for 3 min then cooled to room temperature at acooling rate of 1° C./min in a thermal cycler.

TABLE 6 Oligonucleotides used with BaeI/EcoP15I Adapter. SEQ ID OligoNO: name Sequence (5′>3′) Modification 4415 BEActgctgacACAAgtatcTTTTTTTTTTgtttaaacTTTTTTTTTTgatacACAAgtcagcagA5′phosphate Ad1 4416 BeTagctgacTTGTgtatcAAAAAAAAAAgtttaaacAAAAAAAAAAgatacTTGTgtcagcagT5′phosphate Ad2 12 T7- gcctcgagctaatacgactcactatagag none Ad3 4417 T7-NNctctatagtgagtcgtatta 5′phosphate Ad4 4418 stlgRttagagctagaaatagcaagttaaaataaggctagtccgttatcaacttgaaaaagtggcaccg5′adenylation agtcggtgctttttt

The DNA containing the CCD blunt ends (from earlier section) was thenligated to 50 pmoles of adapter BaeI/EcoP15I, using the blunt/TAligation master mix (New England Biolabs) at room temperature for 30minutes. The DNA was then recovered by incubating with 0.6× Kapa SPRIbeads (Kapa Biosystems) for 5 minutes, capturing the beads with amagnetic rack, washing twice with 80% ethanol, air drying the beads for5 minutes and finally resuspending the DNA in 50 μL buffer 4 (50 mMpotassium acetate 20 mM Tris-acetate, 10 mM magnesium acetate, 100 μg/mLBSA, pH 7.9). Recovered DNA was then digested with 20 units PmeI for 30min at 37° C.; DNA was then recovered by incubating with 1.2× Kapa SPRIbeads (Kapa Biosystems) for 5 minutes, capturing the beads with amagnetic rack, washing twice with 80% ethanol, air drying the beads for5 minutes and finally resuspending the DNA in 50 μL buffer 4. Thesesteps eliminate small (<100 nucleotides) DNA and BaeI/EcoP15I adaptermultimers.

DNA was then digested by addition of 20 units EcoP15I (New EnglandBiolabs) and 1 mM ATP at 37° C. for 1 hour, followed by heatinactivation at 75° C. for 20 minutes. DNA was then ligated to 30 pmolesT7-E adapter using the blunt/TA ligation master mix (New EnglandBiolabs) at room temperature for 30 minutes. DNA was then recoveredusing a PCR cleanup kit (Zymo) and eluted in 20 μL buffer 4.

Purified DNA was then digested by adding 20 units of BaeI (New EnglandBiolabs), 40 pmol/μL SAM (S-adenosyl methionine) and incubating at 37°C. for 1 hour to eliminate both the adapter derived sequences and theCCD (and complementary HGG) motifs. DNA was then recovered using a PCRcleanup kit (Zymo) and eluted in 20 μL elution buffer.

Recovered DNA was then ligated to the stlgR oligo using Thermostable 5′AppDNA/RNA Ligase

(New England Biolabs) by adding 20 units ligase, 20 pmol stlgR oligo, in20 μL ss ligation buffer (10 mM Bis-Tris-Propane-HCl, 10 mM MgCl₂, 1 mMDTT, 2.5 mM MnCl₂, pH 7 @ 25° C.) and incubating at 65° C. for 1 hourfollowed by heat inactivation at 90° C. for 5 min. DNA product was thenPCR amplified using HiFidelity 2× master mix (New England Biolabs).Primers T7-Ad3 and gRU (sequence 5′>3′ AAAAAAGCACCGACTCGGTG) (SEQ ID NO:4419) were used to amplify with the following settings (98° C. 3 min;98° C. for 20 sec, 60° C. for 30 secs, 72° C. for 20 see, 30 cycles).The PCR amplicon was cleaned up using the PCR cleanup kit and verifiedby DNA sequencing, then used as template for an in vitro transcriptionreaction to generate the guide RNAs.

Example 6: NEMDA Method

NEMDA (Nicking Endonuclease Mediated DNA Amplification) was performedusing 50 ng of human genomic DNA. The DNA was incubated in 100 μL thermopolymerase buffer (20 mM Tris-HCl, 10 mM (NH₄)₂SO₄, 10 mM KCl, 6 mMMgSO₄, 0.1% Triton® X-100, pH 8.8) supplemented with 0.3 mM dNTPs, 40units of Bst large fragment DNA polymerase, and 0.1 units of NtCviPII(New England Biolabs) at 55° C. for 45 min, followed by 65° C. for 30min and finally 80° C. for 20 min in a thermal cycler.

The DNA was then diluted with 300 μL of buffer 4 supplemented with 200pmoles of T7-RND8 oligo (sequence 5′>3′gcctcgagctaatacgactcactatagagnnnnnnnn) (SEQ ID NO: 4420) and boiled at98° C. for 10 min followed by rapid cooling to 10° C. for 5 min. Thereaction was then supplemented with 40 units of E. coli DNA polymerase Iand 0.1 mM dNTPs (New England Biolabs) and incubated at room temperaturefor 20 min followed by heat inactivation at 75° C. for 20 min. DNA wasthen recovered using a PCR cleanup kit (Zymo) and eluted in 30 μLelution buffer.

DNA was then ligated to 50 pmoles of adapter BaeI/EcoP15I, using theblunt/TA ligation master mix (New England Biolabs) at room temperaturefor 30 minutes. The DNA was then recovered by incubating with 0.6× KapaSPRI beads (Kapa Biosystems) for 5 minutes, capturing the beads with amagnetic rack, washing twice with 80/o ethanol, air drying the beads for5 minutes and finally resuspending the DNA in 50 μL buffer 4 (50 mMpotassium acetate, 20 mM Tris-acetate, 10 mM magnesium acetate, 100μg/mL BSA, pH 7.9). Recovered DNA was then digested with 20 units PmeIfor 30 min at 37° C.; DNA was then recovered by incubating with 1.2×Kapa SPRI beads (Kapa Biosystems) for 5 minutes, capturing the beadswith a magnetic rack, washing twice with 80% ethanol, air drying thebeads for 5 minutes and finally resuspending the DNA in 50 μL buffer 4.These steps eliminate small (<100 nucleotides) DNA and BaeI/EcoP15Iadapter multimers.

Purified DNA was then digested by adding 20 units of BaeI (New EnglandBiolabs), 40 pmol/μL SAM (S-adenosyl methionine) and incubating at 37°C. for 1 hour to eliminate both the adapter derived sequences and theCCD (and complementary HGG) motifs. DNA was then recovered using a PCRcleanup kit (Zymo) and eluted in 20 μL elution buffer.

Recovered DNA was then ligated to the stlgR oligo using Thermostable 5′AppDNA/RNA Ligase (New England Biolabs) by adding 20 units ligase, 20pmol stlgR oligo, in 20 μL ss ligation buffer (10 mMBis-Tris-Propane-HCl, 10 mM MgCl₂, 1 mM DTT, 2.5 mM MnCl₂. pH 7 @ 25°C.) and incubating at 65° C. for 1 hour followed by heat inactivation at90° C. for 5 min. DNA product was then PCR amplified using HiFidelity 2×master mix (New England Biolabs). Primers T7-Ad3 (sequence 5′>3′gctcgagctaatacgactcactatagag) (SEQ ID NO: 12) and gRU (sequence 5′>3′AAAAAAGCACCGACTCGGTG) (SEQ ID NO: 4419) were used to amplify with thefollowing settings (98° C. for 3 min; 98° C. for 20 sec, 60° C. for 30secs, 72° C. for 20 sec, 30 cycles). The PCR amplicon was cleaned upusing the PCR cleanup kit and verified by DNA sequencing, then used astemplate for an in vitro transcription reaction to generate the guideRNAs.

1. A collection of nucleic acids the nucleic acids in the collectioncomprising: a second segment encoding a targeting sequence; and a thirdsegment encoding a nucleic acid-guided nuclease system protein-bindingsequence, wherein the collection of nucleic acids comprises at least 10⁵unique nucleic acid molecules.
 2. The collection of claim 1, wherein thenucleic acid-guided nuclease system protein is a CRISPR/Cas systemprotein.
 3. The collection of claim 1, wherein the size of the secondsegment varies from 15-250 bp across the collection of nucleic acids. 4.The collection of claim 1, wherein at least 10% of the second segmentsin the collection are greater than 21 bp.
 5. The collection of claim 1,wherein the size of the second segment is not 20 bp and is not 21 bp. 6.(canceled)
 7. The collection of claim 1, wherein the collection ofnucleic acids is a collection of DNA.
 8. The collection of claim 7,wherein the second segment is single stranded DNA.
 9. The collection ofclaim 7, wherein the third segment is single stranded DNA.
 10. Thecollection of claim 7, wherein the third segment is double stranded DNA.11. The collection of claim 1, further comprising a first segmentcomprising a regulatory region, wherein the regulatory region is aregion capable of binding a transcription factor.
 12. The collection ofclaim 1, further comprising a first segment comprising a regulatoryregion, wherein the regulatory region comprises a promoter.
 13. Thecollection of claim 13, wherein the promoter is selected from the groupconsisting of T7, SP6, and T3.
 14. The collection of claim 1, whereinthe targeting sequence is directed at a mammalian genome, eukaryoticgenome, prokaryotic genome, or a viral genome.
 15. The collection ofclaim 1, wherein the targeting sequence is directed at repetitive orabundant DNA.
 16. The collection of claim 1, wherein the targetingsequence is directed at mitochondrial DNA, ribosomal DNA, Alu DNA,centromeric DNA, SINE DNA, LINE DNA, or STR DNA.
 17. The collection ofclaim 1, wherein the sequence of the second segments is selected fromTable 3 and/or Table
 4. 18. (canceled)
 19. The collection of claim 1,wherein the targeting sequence is at least 80% complementary to thestrand opposite to a sequence of nucleotides 5′ to a PAM sequence. 20.The collection of claim 1, wherein the collection comprises targetingsequences directed to sequences of interest spaced about every 10,000 bpor less across the genome of an organism.
 21. The collection of claim20, wherein the PAM sequence is AGG, CGG, or TGG.
 22. The collection ofclaim 20, wherein the PAM sequence is specific for a CRISPR/Cas systemprotein selected from the group consisting of Cas9, Cpf1, Cas3, Cas8a-c,Cas10, Cse1, Csy1, Csn2, Cas4, Csm2, and Cm5.
 23. The collection ofclaim 1, wherein the third segment comprises DNA encoding a gRNAstem-loop sequence.
 24. The collection of claim 1, wherein the sequenceof the third segment encodes for a RNA comprising the sequenceGUUUUAGAGCUAGAAAUAGCAAGUUAAAAUAAGGCUAGUCCGUUAUCAACUUGAAAAAGUGGCACCGAGUCGGUGCUUUUUUU (SEQ ID NO: 1) or encodes for a RNAcomprising the sequenceGUUUUAGAGCUAUGCUGGAAACAGCAUAGCAAGUUAAAAUAAGGCUAGUCCGUUAUCAACUUGAAAAAGUGGCACCGAGUCGGUGCUUUUUUUC (SEQ ID NO: 2).
 25. The collectionof claim 1, wherein the sequence of the third segment encodes for acrRNA and a tracrRNA.
 26. The collection of claim 1, wherein the nucleicacid-guided nuclease system protein is from a bacterial species.
 27. Thecollection of claim 1, wherein the nucleic acid-guided nuclease systemprotein is from an archaea species.
 28. The collection of claim 2,wherein the CRISPR/Cas system protein is a Type I, Type II, or Type IIIprotein.
 29. The collection of claim 2, wherein the CRISPR/Cas systemprotein is selected from the group consisting of Cas9, Cpf1, Cas3,Cas8a-c, Cas10, Cse1, Csy1, Csn2, Cas4, Csm2, Cm5, dCas9 and cas9nickase.
 30. The collection of claim 1, wherein the third segmentcomprises DNA encoding a Cas9-binding sequence.
 31. The collection ofclaim 1, wherein a plurality of third segments of the collection encodefor a first nucleic acid-guided nuclease system protein bindingsequence, and a plurality of the third segments of the collection encodefor a second nucleic acid-guided nuclease system protein bindingsequence.
 32. The collection of claim 1, wherein the third segments ofthe collection encode for a plurality of different binding sequences ofa plurality of different nucleic acid-guided nuclease system proteins.33.-192. (canceled)
 193. A kit comprising the collection of nucleicacids of claim
 1. 194.-235. (canceled)
 236. The collection of claim 1,wherein at least 10% of the nucleic acids in the collection vary insize.
 237. The collection of claim 1, further comprising a first segmentcomprising a regulatory region.
 238. A collection of guide RNAsgenerated by transcribing the collection of claim 1.